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1  Introduction 


The  complex  interrelationships  of  computer  systems  within  the 
BM/C^  setting  impose  stringent  requirements  on  their  performance. 
They  must  reliably  produce  correct  results  within  a  minimal  period  of 
time  and  without  exorbitant  demands  upon  external  resources.  At  the 
same  time,  they  must  be  capable  of  flexible  and  dynamic  response  to 
changes  in  the  processing  environment,  adapting  quickly  to  fluctua¬ 
tions  in  conmunicatioas,  threat  assessment,  resource  availability,  and 
so  forth.  This  need  for  intelligent  and  adaptable  behavior  indicates 
that  the  integration  of  artificial  intelligence  algorithms  may  provide 
significant  enhancements  in  the  behavior  of  BM/Ca  systems. 

The  history  of  poor  performance  demonstrated  by  past  A I  systems 
has  made  real-time  behavior  an  issue  of  concern.  Can  optimization 
techniques  be  systematically  applied  to  AI  programs  in  order  to  bring 
their  performance  up  to  real-time  standards?  Does  any  .such  improve¬ 
ment  presuppose  the  development  of  new  hardware  and/or  software  capa¬ 
bilities?  The  present  report  discusses  the  feasibility  of  optimizing 
AI  algorithms  using  currently  available  resources  and  methodologies 
and  proposes  a  strategy  for  maximizing  improvement  potential. 

The  issues  involved  in  optimizing  AI  programs  can  best  be 
understood  if  we  model  the  performance  of  an  AI  algorithm  in  a 
real-time  situation  as  a  series  of  problem  .solution  systems  operating 
concurrently  at  different  levels  of  abstraction  (see  Figure  t).  At 


Logical  level 


interface 


Physical  level 


Problem  Solution 


Figure  1.  Program  Performance  Modeled  as 
Concurrent  Problem  Solution  Systems 


the  highest  level  is  a  logical  description  of  the  problem  and  the 
steps  required  to  solve  it.  At  the  lowest  level  is  a  physical  system, 
the  architecture  configuration  executing  the  solution.  Between  the 
two  lies  an  implementation  system  which  provides  the  interface  between 
logical  conceptualization  and  physical  reality. 

An  AI  algorithm  can  he  optimized  at  any  or  all  of  these  levels. 
The  logical  system  is  improved  by  devising  a  more  efficient  conceptual 
solution  to  the  problem.  This  may  involve  a  decrease  in  the  amount  of 
processing  required,  as  is  the  case  with  the  development  of  search 
tree  pruning  techniques,  more  effective  heuristic  functions  for 
evaluating  progress  toward  a  goal,  or  methods  of  minimizing  the  ne<?d 
for  data  retrieval.  The  logical  solution  may  also  be  streamlined  by 
the  development  of  faster  processing  techniques  such  as  improved 
methods  of  discrimination  or  better  data  structures  for  problem 
representation. 

The  execution  solution  system  is  improved  by  enhancements  to  the 
physical  resources.  The  architectural  conf iguration  processes  each 
operation  by  first  analyzing  the  memory,  processor,  and 
interconnection  network  elements  required  and  then  controlling  the 
sequencing  of  those  elements.  Improvements  include  the  use  of  faster 
processors,  lookahead  control  capabilities,  and  configurations 
allowing  speedier  data  retrieval,  as  well  as  the  most  obvious 
enhancement,  the  distribution  of  execution  over  a  series  of  parallel 
processors. 

The  intervening  system,  the  implementation,  serves  to  bridge  the 


"semantic  gap"  between  the  expression  of  the  problem  as  a  set  of 
conceptual  relationships  and  as  a  series  of  operations  to  compute 
those  relations.  A  certain  degree  of  inefficiency  is,  of  course, 
inherent  in  any  situation  where  a  logical  solution  must  be  mapped  to  a 
physical  one.  By  viewing  the  interface  syst«n  as  a  separate  entity, 
however,  we  can  attonpt  to  improve  its  transformations  as  a  means  of 
generally  enhancing  the  performance  of  the  algorithm.  This  process 
involves  isolating  those  aspects  of  the  description  system  which  have 
most  impact  on  execution  and  establishing  a  means  of  minimizing 
t  ransformat iona 1  i nef f ic iencies. 

Most  recent  research  efforts  in  the  improvement  of  AI  programs 
have  bi»en  devoted  to  the  d«»scription  and  execution  solution  systems. 
The  current  study  addresses  the  feasibility  of  optimization  at  the 
implementation  system  level  by  exploring  the  logical/physical 
Interface  and  its  performance  implications.  Chapter  2  presents  an 
overview  of  the  implementation  solution  system  and  describes  the 
issues  involved  in  assessing  and  enhancing  program  performance.  This 
establishes  a  inundation  for  the  next  chapter,  which  discusses  the 
limitations  imposed  by  the  processing  environment  and  presents 
comparative  studies  of  program  optimization  in  two  general 
environments,  sequential  and  applicative.  Chapter  4  introduces  the 
concept  of  environment  spanning,  a  strategy  which  seeks  to  maximize 
the  "optimizabi!  itv”  of  AI  algorithms  by  partitioning  programs  into 
segments  for  coordinated  processing  in  a  heterogeneous  environment.  A 
final  chapter  surtnarizes  the  conclusions  of  the  study.  The  appendices 
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provide  supplementary  information  on  optimization  in  the  sequential 
and  applicative  environments  by  cataloging  the  optimizing  trans¬ 
formations  typically  applied  in  each  and  surrmarizing  rmnirical  studies 
of  the  effects  of  optimization  on  program  performance. 


2  Implementation  of  AI  Algorithms 


The  artificial  intelligence  algorithms  developed  for  battle 
management  applications  are  subject  to  strict  prerequisites.  They 
involve  not  only  an  unusual  degree  of  numerical  computation,  but  also 
rigid  performance  constraints  imposed  by  the  real-time  nature  of  BM/C^ 
systfjns.  AI  programs  in  this  setting  are  sophisticated  and  often 
large  in  scale,  requiring  extensive  supplementary  databases  which 
guide  inference  and  discrimination  systems  or  assist  in  calculating 
heuristic  evaluations  of  goal  proximity,  resource  adequacy,  trajectory 
fit,  etc.  The  size  and  complexity  of  these  programs  relegates  them  to 
a  long-term  development  framework.  This  has  the  effect  of  imposing 
the  additional  restriction  that  the  nature  and  amount  of  processing 
required  be  predictable,  at  least  to  the  extent  that  there  Is  some 
assurance  the  algorithm  can  eventually  perform  in  real-time 
situations.  It  must  be  possible  to  simulate  or  otherwise  analyze  the 
selected  implementation  strategy,  predict  the  impact  of  external 
system  conditions,  and  guarantee  that  performance  can  meet  stringent 
time  constraints. 

As  outlined  in  the  preceding  chapter,  the  implementation  system 
provides  an  interface  between  the  conceptualization  of  the  problem 
solution  and  the  hardware-specific  processing.  It  encompasses  a 
number  of  levels  and  types  of  elements,  including  the  goals  and 
subgoals  selected  for  the  implementation,  the  programming  language  in 


which  the  solution  is  expressed,  the  translating  systems  used  to  map 
the  program  to  target  machine  instructions,  and  any  run-time  environ¬ 
ment  layers  which  isolate  the  user  program  from  the  hardware.  All  of 
these  influence  run-time  behavior,  but  the  nature  and  extent  of  their 
effects  vary  significantly. 

The  feasibility  of  performance  improvanent  in  the  implementation 
system  depends  to  a  great  extent  on  two  factors:  (1)  the  ability  to 
accurately  predict  system  performance,  particularly  in  the  sense  of 
identifying  those  features  responsible  for  degradation;  and  (2)  the 
capability  of  applying  some  sort  of  optimizing  transformations  which 
enhance  the  predicted  behavior.  This  chapter  addresses  these  issues 
by  examining  the  general  nature  of  program  behavior,  behavior  pre¬ 
diction,  and  optimization  techniques.  System  characteristics  which 
impose  limitations  on  the  extent  of  program  improvement  feasible 
within  an  AI  framework  are  then  presented. 

Certain  assumptions  have  been  made  as  to  terminology.  The  term 
translator  refers  globally  to  system  software  which  maps  a  program  in 
one  language  (the  source)  to  another  (the  target).  These  include  — 
but  are  not  limited  to  —  compilers,  interpreters,  macroprocessors, 
and  assemblers.  Similarly,  a  translation  is  any  mapping  from  a  source 
to  a  target  language,  typically  a  one-to-many  transformation  (i.e., 
from  a  "higher"  to  a  "lower"  level).  Implementation  is  used  in 
reference  to  any  activity  within  the  implementation  solution  system; 
where  no  ambiguity  results,  the  terms  program  and  implementation 
system  appear  interchangeably.  Finally,  behavior  and  performance  are 


used  synonymously  to  describe  the  observable  aspects  of  execution  such 
as  speed  and  resource  utilization  (as  opposed  to  non-observable 
features  such  as  fault  tolerance  and  correctness). 


2.1  Fundamentals  of  Program  Behavior 

Although  the  notion  of  program  behavior  can  obviously  be 
approached  from  a  number  of  viewpoints,  it  is  convenient  for  our 
purposes  to  group  the  elements  of  the  implementation  solution  system 
into  two  classes.  The  first  includes  those  factors  relating  to  the 
processes  by  which  the  system  is  established,  while  the  second 
encompasses  those  influencing  the  W8y  in  which  the  mature  system 
behaves.  A  corresponding  distinction  is  drawn  between  efficacy  and 
efficiency  as  performance  metrics. 

Implementation  efficacy  measures  the  "effectiveness"  of  the 
system:  an  effectual  program  correctly  produces  the  desired  effect 
|  (without  regard  to  exactly  how  the  effect  is  achieved).  Efficacy  is 

i 

determined  from  the  processes  used  to  create  the  lmpl*mentation  rather 
than  from  consideration  of  the  way  in  which  it  functions. 

The  evolution  of  an  implementation  system  is  typically  viewed  as 
a  sequence  of  phases  (sec  Figure  2:  the  definition  of  requirements, 

establishing  of  specifications,  design  and  development  of  the  source 
program,  and  translation  into  target  form.  Each  transition  from  one 
phase  to  the  next  requires  an  expansion  of  the  problem  representation 
from  a  relatively  abstract  to  a  more  concrete  level.  Associated  with 
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Figure  2.  Predicting  Program  Efficacy 
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each  expansion  Is  a  series  of  constraints  Intended  to  guarantee  the 
fidelity  and  reliability  of  the  mapping  transformation.  The  con¬ 
straints  imposed  at  early  phases  of  development  are  generally  viewed 
as  issues  of  validation  (does  the  implementation  jo  what  is  intended?) 
and  verification  (does  it  produce  the  effect  correctly?).  Efficacy 
factors  such  as  these  ar°  of  major  concern  in  software  engineering, 
but  are  beyond  the  scope  of  the  present  work. 

The  selection  of  a  source  language  and  the  translation  of  the 
resulting  program  are  relevant  to  any  discussion  of  performance 
improvement,  but  in  terms  of  efficacy  their  influence  i«  difficult  to 
measure.  For  reasons  which  should  be  obvious,  subsequent  sections 
assume  that:  (1)  the  implementation  languages  select'd  provide  for  a 
suitable  representation  of  the  algorithm;  (2)  the  program  faithfully 
describes  the  problem  solution;  and  (3)  the  translators  available 
provide  adequate  and  reliable  mappings  from  source  to  object  code. 

In  contrast  to  efficacy,  implementation  ef f iciency  measures 
solution  competency:  an  ef f icient  program  executes  the  task  with  a 
minimum  expenditure  of  physical  resources  (without  concern  for  whether 
or  not  the  task  correctly  solves  the  problem).  Efficiency  issues, 
therefore, derive  fiom  the  manner  in  vhjch  the  implementation  functions 
during  program  execution. 

The  behavior  of  an  implementation  system  can  be  characterized  as 
a  hierarchy  of  abstract  or  virtual  machines  representing  logical 
levels  of  functionality  (described  in  detail  in  later  sections).  The 
interface  between  one  machine  and  the  next  consists  of  a  one-to-many 


mapping  between  instruction  sets,  with  the  final  or  lowest  interface 
providing  a  mapping  to  the  machine  code  of  the  target  architecture. 
Implementation  efficiency  is  a  composite  measure  o<  the  efficiency  of 
all  these  interfaces.  It  reflects  the  appropriateness  of  the  target 
architecture  to  the  task ,  as  well  as  the  adequacy  of  the  transforma¬ 
tions. 


2.2  Predicting  Performance 

The  developer  of  any  real-time  computational  system  makes  an 
implicit  assumption  that  behavior  can  be  predicted  prior  to  program 
execution  and  that  efficiency  is  not  only  an  attainable,  but  also  a 
measurable  goal.  It  should  be  noted  that  execution-time  behavior  in 
itself  may  not  be  a  sufficient  criterion  for  assessing  program 
performance.  If  the  program  forms  part  of  a  complex  software  system, 
additional  factors  such  as  reliability,  maintainability,  modifia¬ 
bility,  and  transportability  must  be  taken  into  account;  in  some  cases 
these  requirements  are  in  direct  conflict  with  the  goals  of  efficacy 
and/or  efficiency. 

A  program's  behavior  can  be  characterized  on  a  variety  of 
levels,  including  execution  speed,  amount  of  memory  occupied  by 
program  instructions,  data  storage  requirements,  number  and  type  of 
operations  performed,  need  for  exclusive  access  to  shared  devices,  and 
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so  forth.  I  Such  measures  are  difficult,  to  quantity  and  tend  to  be 
distressingly  sensitive  to  specific  run-time  conditions.  In  general , 
however,  they  may  be  categorized  as  influencing  three  facets  of 
program  efficiency :  conciseness,  speed,  anti  throughput. 

Conciseness  measures  the  main  and  auxiliary  memory  space  re¬ 
quired  by  the  program  for  both  code  storage  and  data  representation. 
A  program  maximizes  conciseness  by  occupying  a  minimal  amount  of 
space.  Speed  provides  a  similar  metric,  expressed  in  terms  of  the 
amount  of  time  required  for  processing.  Since  this  is  influenced  by 
how  much  of  the  program  is  resident  in  main  memor  "  >us  how  much 
must  be  swapped  in  and  out  of  auxiliary  storage,  conciseness  and  speed 
are  inextricably  related.  The  third  facet,  throughput ,  assesses 
program  productivity  or  output  v is-a-v is  input.  This  measure  is 
inversely  related  to  the  number  of  interrupts  causing  the  suspension 
of  program  execution.  Although  somewhat  related  to  speed  (a  slow 
program  is  more  likely  to  be  interrupted  than  a  fast  one),  throughput 
is  more  representative  of  external  factors,  such  as  the  program's 
requirements  for  system  resources ,  file  structure  ind  data  redundancy, 
or  available  error  handling  facilities. 

Although  the  nature  of  conciseness,  spe-d,  and  throughput  are 
easily  understood  their  evaluation  is  difficult.  The  crux  of  the 

*  The  distinctions  are  rot  alway.s  clear;  some  program  envi¬ 
ronments,  for  example,  do  noi  differentiate  between  program  data 
(instructions',  and  probl-mn  data  (''enable  storage).  For  purposes  of 
claritv,  these  issues  are  deferred  until  later  chapters. 


problem  is  the  identification  of  suitable  standards  for  testing 
efficiency.  A  software  developer  needs  to  predict  the  behavior  of  a 
proposed  implementation  assuming  it  is  designs!  ar,  carried  out  with 
"average"  programing  skill  and  is  executed  utilizing  "average”  data 
on  a  system  running  under  an  "average"  load.  The  use  of  the  criterion 
"average",  however,  while  perfectly  natural  to  the  human  designer,  is 
often  a  statistical  impossibility  in  terms  of  the  computational 
system.  The  assessment  of  program  performance  is  commonly  reduced  to 
a  discrimination  process  which  selectively  isolates  specific  examples 
(benchmarks)  from  the  program  solution  space,  further  restricts  those 
examples  by  the  application  of  selected  input  (test  cases),  and 
executes  them  under  selected  system  conditions,  often  simulated  (see 
Figure  3).  The  resulting  measurements  are  at  best  a  rough  approxima¬ 
tion  and  at  worst  a  gross  misrepresentation  of  real-time  conditions. 

The  selection  of  test  programs  commonly  entails  the  use  of  an 
evaluation  suite.  This  may  be  composed  of  task-spocif ic ,  application- 
specific,  or  naturally-occurring  benchmarks,  or  any  combination  of  the 
three.  Task -spec i f ic  benchmarks,  which  isolate  particular  and 
presumably  typical  program  activities  for  individual  measurement ,  are 
the  performance  evaluator's  version  of  unit-level  testing.  Although 
they  can  be  a  valuable  source  of  data  on  speed  and  throughput, 
task-specific  tests  are  often  misleading  because  they  cannot  reflect 
system  interactions  or  load  conditions.  Furthermore,  they  are 
inordinately  susceptible  to  "edge  effects",  or  the  non- representative 
results  which  occur  when  a  program  slightly  exceeds  or  barely  misses 
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critical  limits  (such  as  rmr»ry  page  size  or  I/O  transmission  bounds) . 

To  a  certain  extent,  application-specific  benchmarks,  which 
attempt  to  provide  more  realistic  measurements  by  mixieling  the  pro¬ 
posed  solution  on  some  reduced  scale,  compensate  for  those  drawbacks. 
They  are  prone  to  the  biases  typical  of  any  modeling  situation, 
however,  notably  oversimplification  of  the  problem  and  the  failure  to 
accurately  portray  the  effects  of  system  load.  Since  they  execute  on 
a  reduced  scale,  there  is  also  a  tendency  to  atypical  behavior,  such 
as  a  distorted  view  of  initiation  versus  execution  costs. 

Naturally-occurring  code  may  also  be  selected  for  benchmark 
analysis.  In  this  case,  existing  programs  which  solve  related  prob¬ 
lems  are  utilized  in  an  attempt  to  approximate  "real-life"  processing 
conditions.  Although  these  are  somewhat  better  than  other  benchmarks 
in  reflecting  the  effects  of  system  load,  they  are  often  biased  toward 
a  specific  problem  or  implementation  system.  They  are  also  more 
likely  to  reflect  individual  levels  of  programming  skill  and/or 
system-dependent  optimizations. 

Occasionally,  criteria  other  than  benchmark  suites  are  used  to 
assess  program  behavior.  The  most  common  approach  is  to  evaluate  the 
mappings  which  provide  the  interface  from  one  abstract  machine  to  the 
next  by  identifying  the  number  and  type  of  instructions  generated  at 
each  level  and  ultimately  quantifying  the  time  required  to  execute  the 
physical  machine  instructions.  The  primary  drawback  of  this  methodo¬ 
logy  is  that  it  cannot  adequately  reflect  external  run-time  conditions 
influencing  program  behavior,  such  as  system  load,  resource  contention 


problems,  and  communication  blockages.  Like  task -specif ic  benchmark 
tests,  this  type  of  measurement  is  also  susceptible  to  edge  effects. 


It  is  obvious  that  no  single  methodology  for  evaluating  program 
behavior  accurately  portrays  the  interrelationships  among  conciseness, 
speed,  and  throughput.  The  assesanent  of  program  behavior  is  still  a 
black  art,  and  few  guidelines  are  generally  applicable.  These  caveats 
should  be  borne  in  mind  when  later  sections  discess  the  results  of 
benchmark  analysis. 

2.3  Improving  Performance 

The  performance  requirements  inherent  in  real-time  computation 
systems  have  engendered  a  widespread  interest  in  the  uevelopment  of 
techniques  for  program  optimization.  In  general,  optimization  refers 
to  the  transformation  of  a  program  implementation  in  order  to  improve 
its  execution- time  performance.  The  term  is  often  applied  loosely  to 
any  translator  which  makes  a  significant  effort  to  generate  efficient 
target  code;  it  is,  however,  better  applied  to  specific  attempts  to 
rearrange  or  alter  program  operations  so  that  the  target  program  is 
more  efficient  than  that  generated  by  a  direct  translation. 

The  term  optimization  is  misleading  for  several  reasons.  First, 
the  notion  of  optimality  is  imprecise.  No  known  metric  suffices  to 
describe  behavior  dynamics  and,  as  indicated  in  previous  sections, 
performance  in  itself  may  rot  be  a  suitable  criterion  for  evaluating 
program  "goodness’'.  Furthermore,  the  term  optimization  implies  that  a 
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unique  optimal  solution  exists  and  that  it  can  be  recognized  as  such. 
This  is  extremely  unlikely,  in  view  of  the  complex  interrelationships 
among  individual  performance  characteristics;  optimizing  transforma¬ 
tions  can  rarely  be  applied  without  ambiguity,  and  the  improvement  of 
one  aspect  of  a  program's  behavior  can  have  adverse  effects  on  others. 
Finally,  most  existing  techniques  are  applied  on  the  basis  of  pre- 
execution  program  analysis,  restricting  their  activities  to  those 
portions  of  the  program  which  are  not  overly  dependent  on  run-time 
values. 

In  spite  of  its  inappropriateness,  however,  optimization  is  the 
term  most  generally  used  in  reference  to  performance  improvement.  The 
present  report  will  use  the  words  optimization,  improvement,  and 
enhancement  interchangeably,  with  the  understanding  that  they  repre¬ 
sent  a  relative  rather  than  an  absolute  goal. 

Central  to  the  idea  of  optimization  is  the  concept  of  program 
equivalence.  Since  any  number  of  programs  can  be  devised  to  produce 
identical  run-time  results,  the  goal  of  program  improvement  is  the 
generation  of  the  most  efficient  means  of  achieving  the  desired 
results.  Optimizing  transformations  thus  represent  automated  attempts 
to  improve  upon  the  programmer's  description  of  the  algorithm.  Such 
alterations  are  important  in  order  to  compensate  for  the  inefficien¬ 
cies  inherent  in  the  use  of  high-level  languages,  which  often  suppress 
those  details  of  the  object  language  having  most  influence  on  program 
performance. 


The  implementation  of  program  improvements  presupposes  the 
formulation  of  a  set  of  transformations  which  will  produce  a  program 
equivalent  to  the  original .  Each  such  transformation  is  described  in 
terms  of  the  relationships  between  program  elements  which  are 
necessary  preconditions,  in  combination  with  a  meaning- preserving 
transformation  rule.  Any  constraints  governing  the  order  of  applying 
transformations  must  also  be  supplied.  The  situation  is  complicated 
by  the  fact  that  relatively  few  optimizations  are  finite.  If  a 
transformation  can  be  repeatedly  applied  without  ever  reaching  a  point 
where  it  is  no  longer  possible  to  apply  it,  artificial  boundary 
conditions  must  be  established  to  terminate  the  process.  Additional 
safeguards  may  need  to  be  instituted  in  order  to  present  conflicts 
between  individual  transformation  rules. 

The  concept  of  meaning-preserving  transformations  can  most 
easily  be  understood  by  viewing  the  execution  of  a  particular  program 
as  a  s«>quenco  of  actions  (A^  ...  An).  For  reasons  which  will  be  made 
clear  in  later  chapters,  these  actions  should  be  considered  abstractly 
and  not  equated  with  program  instructions;  since  each  program  action 
is  explicitly  represented ,  there  are  no  control  constructs  (e.g. , 
branches,  loops,  etc.)  ir  this  representation.  Figure  4  illustrates 
the  application  of  five  corrmonly  applied  optimizing  transformations  to 
such  an  execution  sequence.  Note  that  sane  of  the  improvements  may  be 
contradictory.  For  example,  the  replacement  of  an  action  by  a  faster 
equivalent  may  increase  program  storage  space  as  it  shortens  execution 
tiny1.  Similarly,  the  elimination  of  redundant  calculations  saves  time 


and  program  storage  but  may  increase  requirements  for  temporary  data 
storage. 

Improvement  strategies  can  be  classified  according  to  a  variety 
of  criteria.  As  indicated  in  Figure  4,  it  is  common  to  group  trans¬ 
formations  according  to  the  type  of  improvement  effects  (e.g., 
reduction  in  execution  time,  reduction  of  instruction  storage,  re¬ 
duction  of  data  storage,  reduction  of  I/O  interfaces,  etc.).  Other 
categorizations  are  based  on  technique  applicability  (machine  depen¬ 
dent  or  independent  optimization),  scope  of  improvement  efforts 
(local,  global,  interprocedural,  etc.),  number  of  applications  (finite 
vs.  infinite  transformations),  and  of  course  the  particular  type  of 
technique  used  (e.g.,  expression  simplification,  code  rearrangpment , 
operation  frequency  reduction,  peephole  optimization) . 

An  alternative  approach  is  to  view  optimizations  in  terms  of  the 
implementation  stages  at  which  they  are  performed  (Figure  5).  Source- 
level  transformations  are  applied  by  a  preprocessor  which  generates  an 
altered  version  of  the  source  program,  allowing  machine  independent 
(but  language  dependent)  improvement  strategies.  Ob.ject-level 
transformations,  applied  by  a  postprocessor  to  the  machine  code 
generated  by  the  translator,  are  conversely  machine  dependent  and 
language  independent.  Most  current  implementations  of  optimizers, 
however,  are  incorporated  in  compiler  systems  and  operate  on  some 
intermediate  form  of  program  representation.  The  transformations  are 
of  varying  degrees  of  language/machine  dependence,  according  to 
whether  they  are  applied  during  the  syntactic  analysis,  semantic 
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Figure  5.  Implementation  Stages  at  Which 
Optimizing  Transformations  May  Be  Applied 


attribution,  target  code  neneration.  or  peephole  optimization  '.last 
stase  ot  code  veneration)  phase. 


To  b«*  et  feet  i  vo  ,  an  optimization  iochri.;,ue  must  »"•:  irsnt  <t  im¬ 
prove  <1  performance  in  all  possible  rarcutiot  a  no  must,  ihj  .  change 

either  program  behavior  or  results,  even  it  eraution  is  abnormally 
terminated.  In  addition .  it  should  be  cast  et:  stive  *  :i  tents  of  the 
tune  required  to  perforni  the  transtomation  compared  to  trie  execu¬ 
tion-time  improvement  which  will  be  realized.  Although  ideally  it 
should  be  possible  to  apply  transformations  consecutively  with  no 
ambiguity  or  conflict,  i  r.  practice  most  improvement  chniques  are 
complex,  of  uncertain  duration,  of  limited  applicability,  and  often 
contradictory  in  nature  and  results. 
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3  Limitations  on  the  Potential  for  Optimization 


Although  the  nature  of  optimizing  transformations  poses  some 
inherent  difficulties,  the  greatest  obstacle  to  program  improvement 
stems  from  the  manner  in  which  the  implementation  system  functions 
during  program  execution.  The  descriptive  solution  system  specifies  a 
problem  by  defining  a  series  of  relations  (among,  for  example,  input, 
output,  sub-problem  solutions,  inferential  systems ,  and  solution 
goals).  Within  the  execution  system,  the  same  elements  have  been 
reformulated  in  terms  of  the  ways  in  which  the  relations  are  computed. 
In  providing  a  logical/physical  interface  between  these  systems,  the 
implementation  establishes  a  general  paradigm  nr  framework  within 
which  the  solution  is  expressed.  Since  this  paradigm  effectively 
creates  an  environment  guiding  processing  activities,  it  will  be 
referred  to  as  the  processing  environment. 

In  this  chapter,  two  general  processing  environments,  sequential 
and  applicative,  will  be  contrasted.  As  their  names  suggest,  the  most 
obvious  distinction  between  the  two  is  the  notion  of  program  control; 
this  is  not,  however,  the  only  point  of  difference.  Equally  clear 
distinctions  can  be  drawn  on  the  basis  of  such  features  as  the 
relative  importance  of  data  definition  versus  data  manipulation  in 
describing  the  implementation,  the  meaning  of  program  symbols,  the 
moment  at  which  properties  are  bound  to  them,  and  the  number  and  type 
of  interfaces  which  are  layered  to  form  the  implementation  system. 


A  sequential  processing  envirvment  views  the  underlying  archi¬ 
tecture  as  a  traditional  von  Neumann  machine.  This  does  not  necessar¬ 
ily  mean  that  the  processor  controlling  execution  is  sequential,  but 
simply  that  the  problem  solution  is  uescrilv  J  in  terms  of  a  sequence 
of  steps  to  bo  carried  out  m  a  determined  order.  Historically,  most 
views  of  complication  have  been  based  on  the  notion  of  instruction 
sequencing  and  by  far  the  majority  of  existing  systems  operate  along 
these  lines.  As  a  natural  consequence,  most  sottwarc  implementations 
make  use  of  this  processing  enviroment. 

The  nature  of  sequential  processing  presuyp  >s<  s  a  relatively 
constrained  representation  of  the  problem  solution  describing  its 
progression  from  start  to  finish;  a  program  is  there'. uv  expressed  in 
terms  of  a  series  of  operations  which  manipulate  data.  For  this 
reason ,  most  programming  languages  designed  for  a  sequential  environ¬ 
ment  are  called  "procedural",1  "algorithmic",  or  1  imperative".  They 
provide  general  data  formatting  capani li ties  as  well  as  high-level 
versions  of  the  elor.eptarv  control  sequences  available  on  vor.  Neumann 
architectures,  such  as  repetition,  selection,  branching,  and  inter- 
sequencing  (subprogram  linkage).  Although  the  sections  on  sequential 
processing  occasionally  make  r«f  rence  ;.o  specific  language 
impl  <-rient.a‘  lor.s ,  this,  is  lor  illustrative  purposes  only.  The 
similarities  among  procedural  langi  ages  and  sequential  implementations 

1  The  use  of  "procedure"  in  reference  to  a  sub-program  unit  thus 
derives  from  the  term  "procedural"  (i.e.  ,  sequential )  describing  the 
nature  of  pongram  specification. 
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are  so  fundamental  that  they  may  be  viwd  as  essentially  homogeni'ous 
in  terms  of  their  implications  for  program  optimization. 

Although  most  existing  systems  are  sequent i - 1  in  nature,  AI 
programs  are  typically  implemented  in  an  applicative  or  non-sequential 
processing  environment.  This  type  of  environment  does  not  view  the1 
underlying  machine  as  a  von  Neunann  model  of  computation  and  therefore 
is  not  suited  to  the  same  types  of  optimizations.  Whereas  the 
sequential  envirorment  approaches  execution  as  a  discrete  series  of 
algorithmic  steps,  non-sequential  processing  revolves  around  such 
concepts  as  reduction,  resolution,  and  unification.  Again,  this  does 
not  imply  anything  about  the  nature  of  the  processor  itself.  In  fact, 
since  the  majority  of  existing  computer  systems  rely  on  sequential 
processors,  most  applicative  processing  environments  are  superimposed 
on  sequential  implementation  layers.  It  is  only  with  the  growing 
interest  in  largescale  AI  programs  during  the  past  decade  that 
non-sequential  architectures  have  become  a  viable  processing 
alternative. 

Non-sequential  problon  solutions  are  more  concerned  with  defin¬ 
ing  underlying  relations  than  with  prescribing  their  computation. 
Just  as  the  procedural  languages  provide  a  natural  expression  of  the 
sequential  approach,  non-sequential  processing  is  best  described  by 
the  "symbolic"  or  "definitional"  languages.  'Hiese  can  be  subdivided 
into  three  groups  which  have  evolved  along  divergent  lines  since  the 
first  symbolic  language,  LISP,  was  developed  by  McCarthy  in  the  1960s. 

The  largest  and  best  known  group  encompasses  "functional"  lan- 
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guages,  which  view  a  program's  output  as  a  function  (in  the  mathe¬ 
matical  sense)  of  the  input.  During  'xocu  ion.  successive  reductions 
are  used  to  simplify  the  program  function  until  further  applications 
are  impossible.  Recursion  and  functional  composition  are  the  primary 
control  mechanisms,  with  each  operation  performed  when  the  result  it 
generates  is  needed  by  an  invoking  instruction. 

In  contrast,  "logical"  languages  employ  resolution  and  unifica¬ 
tion  as  their  primary  processing  mechanisms.  Program  execution  is 
approached  in  terms  of  proof  derivation.  A  series  of  propositional 
logic  implications  formally  describe  the  underlying  t-rms  or  assump¬ 
tions  and  any  inferential  relations  between  them,  with  the  desired 
output  expressed  as  one  or  more  queries.  To  satisfy  tn  •  goal,  appro¬ 
priate  patterns  are  selectfd  from  the  rule  base  to  produce  a  solution 
space. 

The  third  class  includes  the  nnvest  addition  to  the  spectrum  of 
programming  languages,  the  "dataflow”  languages.  As  the  name  sug¬ 
gests,  these  approach  execution  as  inherently  concurrent,  with  the 
firing  of  each  instruction  dependent  solely  on  the  availability  of  its 
data  inputs.  Dataflow  operatioas,  like  their  functional  and  logical 
counterparts,  r.n*  expressed  in  te"ms  of  functional  applications. 
Although  recursion  is  eliminated  as  a  programming  tool,  the  mathe¬ 
matical  notion  of  function  is  extended  to  allow  the  return  of  more 
than  one  value. 
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Applicative  processirg  is  not  the  only  alternative  to  the 
sequential  environment.  Strictly  speaking,  the  term  applicative  is 


appropriate  only  with  respect  to  a  functional  or  reduction  paradigm; 
unification  and  dataflow  systems  more  properly  constitute  separate 
non-sequential  entities.  The  only  systems  which  arc  currently  capable 
of  meeting  real-time  performance  standards,  however,  are  applicative 
(this  topic  is  discussed  in  more  detail  in  the  next  chapter).  Fur¬ 
thermore,  the  three  systems  share  almost  as  many  fundamental  simi¬ 
larities  as  do  the  procedural  languages  —  as  witnessed  by  the  fact 
that  virtually  all  post-experimental  implementations  of  logical  and 
dataflow  languages  are  interpreted  representations  relying  on 
applicative  evaluators.  In  keeping  with  the  objective  of  assessing 
the  feasibility  of  optimization  using  available  methodologies, 
therefore,  the  discussions  of  non-sequential  processing  emphasize  the 
applicative  model. 

The  sections  which  follow  examine  the  functional  characteristics 
of  the  implementation  system  in  general  before  focussing  on  features 
specific  to  the  two  processing  environments.  Particular  attention  is 
given  to  those  features  which  have  implications  for  program 
optimization. 

3.1  Structure  of  the  Implementation  Solution  System 

Between  the  computer  which  the  applications  software  user  sees 
and  the  physical  machine  controlling  execution  are  a  series  ot 
abstract  or  virtual  computers.  Each  level  in  the  hierarchy  represents 
a  functionally  distinct  machine  with  a  specific  instruction  set, 


resource  conf igurat inn ,  and  implementation  strategy.  Hie  implications 
for  optimization  are  critical:  program  improvement  at  one  level  does 
not  necessarily  result  in  efficiency  at  the  next. 

Figure  6  illustrates  the  abstract  computer  hierarchy  for  a 
tynical  implementation  system.  At  the  highest  level  is  the  machine 
defined  by  the  applications  program.  The  "program"  it  executes  is  the 
input  data,  and  execution  is  expressed  in  terms  of  the  operations 
available  ir.  the  high-level  language  of  the  source  program.  Although 
the  applications  prograimer  does  not  normally  conceive  of  his  program 
as  a  "translator",  in  fact  it  functions  as  such  by  f  "  r\; forming  the 
input  data  into  a  "target  program"  which  is  executed  bv  the  computer 
represented  at  the  next  level. 

The  operations  of  the  applications  program  in  turn  serve  as 
"input"  to  the  virtual  machine  defined  by  the  high-level  language 
translator.  This  machine's  actions  are  described  by  another  instruc¬ 
tion  set,  generally  the  operations  that  are  expressed  in  mnemonic  form 
as  the  assembly  language  of  the  target  computer.  The  assembly-level 
instructions  provide  an  interface  to  yet  another  abstract  machine, 
this  time  executing  the  primitives  provided  by  the  operating  system. 
At  the  lowest  level  of  the  implementation  system  --  just  above  the 
physical  interface  to  the  execution  system  —  is  the  microcode  or 
sub-primi t i ve  abstract  machine  which  translates  operating  system 
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Figure  6.  Abstract  Machines  in  the  Implementation 

Solution  System 


functions  to  physical  machine  instructions. * 

The  abstract  machine  hierarchy  thus  bridges  the  semantic  gap 
between  the  description  and  execution  systans  by  successive  interpre¬ 
tations.  The  mult ipl  icity  of  layers  greatly  complicates  optimization 
activities.  At  each  level,  operations  must  be  expressed  in  terns  of 
the  available  instruction  set,  generally  without  regard  to  the  appro¬ 
priateness  with  which  those  instructions  will  be  translated  into  the 
next  machine's  operations.  Inter- level  transformations  thus  tend  to 
lx*  "black  boxes"  fmm  which  little  can  be  assumed  about  the  ultimate 
fate  of  optimization  attempts. 

The  hierarchy  of  abstract  machines  clearly  parallels  the  common 
stratification  of  programming  languages  into  classes  of  varying 
degrees  of  machine  dependence  (Figure  7).  The  similarity  may  be 
misleading,  however,  since  the  selection  of  a  particular  language  does 
not  necessarily  increase  or  diminish  the  number  of  implementation 
mappings^  The  level  of  abstraction  of  the  language  does  determine  to 
some  extent  the  degree  of  optimization  feasible.  A  programmer  using 
assembly  language,  for  example,  can  modify  his  solution  to  take 
advantage  of  specialized  instructions,  while  the  user  of  a  very-high 
level  language  may  unwittingly  express  the  program  in  such  a  way  that 
no  improvement  is  possible.  In  general,  the  difficulty  of  trans- 


1  Many  implementat ions  actually  involve  more  levels  than  are 
shown  in  Figure  6.  The  "operating  system  computer",  for  example,  is 
generally  sub-stratified  into  a  library  program  level,  a  utility 
level  ,  and  perhaps  a  supervisory  level  .  Fach  of  these  has  a  distinct 
set  of  primitives  providing  a  transition  to  the  next  lower  level. 
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forming  a  program  concisely  and  efficiently  increases  in  direct 
proportion  to  the  degree  of  nAchine  independence.  Subsequent  chapters 
distinguish  among  levels  of  language  where  appropriate;  if  no  mention 
of  this  is  made,  it  should  be  assumed  that  the  effect  is  negligible. 

Another  'actor  which  imposes  limitations  on  the  nature  and 
degree  of  optimization  activities  is  the  manner  iri  which  each  abstract 
machine  effects  the  transition  from  input  to  target  program.  Compila¬ 
tion  systems  statically  analyze  the  input  program  and  generate  an 
output  program  which  is  completely  expressed  in  terms  of  target 
instructions.^  This  output  is  ready  for  "execu  ■  *  '  r\"  by  the  next 
level's  machine  as  though  it  had  originally  been  coded  in  that  form; 
it  is  immaterial  whether  the  program  is  executed  irnodiately  or  at 
some  later  date.  In  contrast,  interpretation  systems  defer  most 
transformations  until  run-time,  when  the  abstract  machine  interprets 
or  simulates  the  input  program  through  the  use  of  library  routines, 
which  provide  the  low-level  algorithms  necessary  to  carry  out  each 
instruction  of  the  higher-level  language.  Most  implementation 
hierarchies  include  a  combination  of  compiling  arid  interpreting 
translators.  This  complicates  the  optimization  process  since 
transformations  which  are  possible  <  r  desirable  in  one  system  may  not 
b«  feasible  in  the  other. 

Thus  far,  the  function!  ’g  of  each  abstract  computer  has  been 

1  The  process  is  also  ’•ef  erred  to  as  "translation";  in  the 
present  discussion,  however,  this  term  is  used  generically  to  include 
both  erropilation  and  interpretation. 
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characterized  in  terms  of  its  interaction  with  adjacent  machines.  In 
the  sections  which  follow,  the  implementation  system  is  approach**!  in 
terms  of  the  framework  imposed  by  the  processing  environment  select**! 
for  program  execution.  It  is  important  to  note  that  the  processing 
environment,  like  other  features  of  the  implementation  syst*m,  is 
neither  wholly  physical  nor  wholly  logical.  It  must  concretize  both 
problem  definitions  and  computational  procedures,  yet  it  does  so  in 
terms  of  an  abstract  rather  than  an  absolute  view  of  the  underlying 
physical  architecture. 

3.2  Characteristics  of  Sequential  Processing 

In  the  sequential  environment  a  clear  distinction  is  made 
between  control  and  data  elements,  with  a  separate  portion  of  program 
storage  allocated  to  each.  The  control  segment  represents  the  series 
of  instructions  to  be  performed,  so  elements  may  only  take  on  certain 
configurations  (i.e.,  instruction  codes  accepted  by  the  target  ab¬ 
stract  computer)  and  their  ordering  is  crucial.  With  tew  exceptions, 
the  control  segment  is  immutable ,  that  is,  its  contents  are  not 
altered  during  execution.  The  data  segment  provides  storage  elements 
which  can  be  manipulated  by  the  instructions.  In  this  case  the 
ordering  and  configuration  of  individual  elements  is  incidental 
(although  possibly  subject  to  formatting  constraints  imposed  by  the 
target)  and  the  code  is  mutable ,  or  may  be  freely  altered  during 
execution.  The  control /data  dichotomy  is  maintained  throughout  the 


abstract  machine  hierarchy  of  the  implementation  syst-m.  'Hie  transfor¬ 
mation  effected  by  each  level  inci  ides  high  to  low  u>vel  ,  one-to-man/ 
mappings  of  both  instructions  and  data  ittms.  as  we  shall  see,  this 
characteristic  division  has  important  reoercussi  or.  or  the  nature  of 
optimization  activities. 

Figure  8  illustrates  the  layering  of  abstract  machines  typically 
found  in  sequential  processing  environments.  Below  the  applications 
urogram  computer  lie  two  levels  cf  system  support  implemented  by  means 
of  software  and  firmware.  Software  support  machines  include  those 
expressed  in  terms  of  the  primitive  operations  p;  .  ■  *>  1  by  the  pro- 

gramning  language,  library  rout 'nos  established  or  levels  specific  to 
or  independent  of  the  programming  language,  and  tb  nderlying  sub- 
primitives  available  through  the  operating  s.yst (Ada  language 
implementations  are  somewhat  different  from  this  configuration,  since 
the  KAPSE  typically  provides  machine  dependent,  low-level  services 
which  wholly  or  partially  replace  access  to  the  normal  operating 
system  subprim  i  fives.  In  this  respect .  it  is  similar  to  the  layering 
of  non-sequent tal  environments  established  on  sequential  architec¬ 
tures;  see  Figure  9.1  The  firmware  level  rr.ach i no  transforms  the 
software  *  ns  truer  ions  into  the  mu  roc  ode  .subprim  .fives  used  by  the 
hardware  con  fad  unit. 

The  structuring  of  the  programming  language  implementation  may 
allow  the  applications  program  to  lire-: fly  access  the  library  rou¬ 
tines,  bit  this  is  not  always  the  ease.  These  potential  interfaces 
are  i  nd icn r<-d  1  n  Figure  8  by  vertical  channels  bypassing  other  inter- 
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face  levels.  The  implications  of  support,  layering  should  bo  obvious. 


since  each  level  conceptually  represents  art  abstract  machine:  the 
number  of  transformations  required  during  execution  is  in  direct 
proportion  to  the  depth  of  support  layers. 

Suppose  that  a  given  machine  M,  in  the  system  receives  as  input 
a  version  Pi^  of  the  program.  Py  includes  two  portions,  a  control 
sryjnent  constructed  from  the  instruction  set  for  V  and  a  data  segment 


consisting  of  a  series  of  storage  elements,  these  are  represented  as 
N  ®  {il>  1  ? . i  <  j  and  kM"{dl>  d2-  •••.  \  \  •  respectively.  M 


t  ransforms  pm  to  an  output  version  Py,  targetted  to  'h  't*xt  machine. 

M'  --  i.e.,  IM.  =  {  i ' ! ,  i '  2  -  i'm?and  ^ '  =  {d'l-  a’2 . d’n^- 

Because  M'  is  closer  to  the  physical  level  than  is  M,  :  r.  is  typically 


the  case  that  |Im'I  >  MmI  and  IDm'I  >  1  Pm >  *  Below  M'  are  additional 
machines  M' '  ,  etc.,  each  processing  a  more  primitive  level  of  instruc¬ 


tions. 


Since  it  represents  the  cumulative  efficiency  of  all  abstract 
machines  in  the  system,  program  efficiency  is  in  general  adversely 
affected  by  the  number  of  support  layers.  When  a  short-circuit 
channel  exists  from  M  to  M'*,  howeve-,  it  becomes  feasible  for  the 
output  prograT  to  include  some  i  astructions  which  are  already 
targettxl  to  the  lower  machire,  thus  obviating  one  or  more  levels  of 
transformation.  A  well  design**!  translator  can  mitigate  the  effects 
of  layering  by  expressing  tre  output  program  in  such  a  way  that  it  can 
continue  to  be  transformed  efficiently  at  lower  levels  and/or  take 
advantage  of  short-circuit  channels  to  bypass  processing. 


3.3  Optimization  in  the  Sequential  Environment 

To  optimize  program  storage  space,  it  is  c’.-arly  necessary  to 
minimize 

size  (PMn)  =  size  (IMn)  +  size  (D^), 

whore  Mn  is  the  final  (lowest)  abstract  machine  in  the  implementation 
system.  (Note  that  a  minimal  total  transformation  does  not  nec¬ 
essarily  mean  that  each  partial  transformation  is  minimal,  although 
this  is  typically  the  case.)  Time  optimization  is  not  as  easy  to 
characterize.  The  generalized  view  of  execution  (employed  in  the 
preceding  chapter)  as  a  linear  series  of  program  actions  must  now  be 
reformulated  in  terms  of  the  abstract  machine  instructions  usiai  to 
express  the  program,  including  nonlinear  control  directives.  Total 
execution  speed  will  depend  on  the  speed  of  each  target  instruction 
and  the  number  of  times  —  possibly  none  —  it  is  executed,  rather 
than  the  number  of  instructions  (i.e. ,  |IMn|  )• 

The  remainder  of  this  section  will  refer  to  program  execution  in 
terms  of  the  operational  semantics  of  a  single  abstract  machine,  M. 
selected  arbitrarily  from  the  implementation  system.  In  describing 
the  effects  of  executing  the  program  P^,  it  should  be  obvious  that  we 
are  modeling  the  simultaneous  operation  ot  all  abstract  computers  in 
the  configuration.  This  is  consistent  with  the  observation  that 
available  optimization  techniques  are  bas«i  on  general  aspects  ot  the 
control/data  dichotomy  rather  than  features  specific  to  any  single 


the  initial  state.  Any  instruction  occurring  in  a  program  state 
snapshot,  however,  is  by  definition  reachable  through  at  least  one 
computation  sequence.) 

It  is  important  to  observe  that  the  computation  of  the  next  in¬ 
struction  of  the  control  segment  is  distinct  from  any  transformation 
made  to  the  data  segment.  Because  to  a  large  extent  control  and  data 
function  autonomously  (with  the  obvious  exception  that  transfers  of 
control  may  be  contingent  upon  data  values) ,  they  may  be  analyzed 
independently;  hence  the  terms  control  flow  analysis  and  data  flow 
analysis.  Control  flow  analysis  establishes  the  feasible  progressions 
of  the  i  components  of  program  states  by  considering  what  instructions 
may  be  executed  in  what  sequences  beginning  at  the  initial  state. 
Data  flow  analysis,  on  the  other  hand,  concentrates  on  the  range  of 
the  D  components  by  determining  what  values  mav  be  taken  on  by 
individual  data  items. 

Appendix  A  describes  optimization  activities  in  the  sequential 
environment  and  summarizes  the  results  of  studies  attempting  to 
quantify  what  effect  optimizing  transformations  have  on  program 
performance.  In  spite  of  the  inconclusive  nature  of  this  data,  it  is 
possible  to  generally  identify  the  features  of  the  sequential 
processing  environment  having  greatest  influence  —  favorable  or 
adverse  —  on  optimization  potential. 

The  importance  of  the  support  system  configuration  has  already 
been  described.  Both  the  number  of  layers  and  the  quality  of  the 


translations  influence  the  overall  effectiveness  of  improvement 


efforts.  In  addition,  optimization  activities  are  favorably 
influenced  by  such  programming  practices  as  the  intelligent  selection 
of  data  formats  to  minimize  the  ne<?d  for  coercion  or  conversions .  the 
organization  of  expressions  to  facilitate  the  application  of  algebraic 
transformations,  and  a  caretui  placement  of  non-opt imizabl e 
subexpressions  (such  as  those  involving  invocations  of  user-defined 
functions)  with  respect  to  loops  or  other  control  structures. 
Optimization  is  adversely  affected  by  the  use  of  language  features 
which  interfere  with  control  and  data  flow  analysis  or  value 
propagation.  These  include,  tor  example,  aliased  or  dynamically 
allocated  variables,  global  or  other  data  storage  which  is  modified  by 
side  effects,  unconditional  transfers  spanning  sever?’  control  struc¬ 
tures  (e.g.,  a  GOTO  whose  target  is  outside  the  boundaries  of  the 
enclosing  logical  interval),  the  use  of  control  variaoles  (i.e.,  entry 
and  label  variables)  and/or  directly  or  indirectly  recursive 
subprogram  units,  and  mixed  mode  expressions. 

Optimization  tec  1  .iques  in  the  sequential  erv i ronment ,  as  we 
have  seen,  rely  extensively  on  a  f<*v  fundamental  assumptions:  (1)  a 
bipartite  control/data  organization,  consisting  of  an  immutable 
control  segment  and  a  mutable  date  store,  f 2)  the  use  of  program 
symbols  as  references  to  specific  locations  in  tno  segments;  (3)  a 
subs<>qtiont  dependence  on  etitic  (pre-execution)  analysis  to  associate 
or  bind  symbols  to  loc  it  ions:  (4)  an  ultimate  restriction  of  program 
control  to  very  simple  primitives,  namely  sequencing  and  conditional 
branching;  and  (5)  a  corresponding  reliance  on  iteration  and  selection 


as  the  basic  control  flow  mechanisms.  Any  deviation  from  these  can 
seriously  impede  improvement  activities  at  all  levels.  Consequently, 
the  two  factors  most  important  in  establishing  potential  for  opti¬ 
mization  are,  without  doubt,  the  way  in  which  the  problem  solution  is 
expressed  by  the  programmer  *  and  the  configuration  of  support  layers 
in  the  implementation  systan. 

3.4  Characteristics  of  Applicative  Processing 

Unlike  the  sequential  environment,  applicative  processing  does 
not  employ  a  bipartite  organization  nor  does  it  approach  problem 
solution  in  terms  of  a  series  of  operations  manipulating  data  ele¬ 
ments.  Instead,  a  program  is  a  function  applied  to  the  input;  the 
resulting  value  is  the  output.  Since  no  real  distinction  is  made 
between  program  data  and  problem  data,  a  single  otnmon  representation 
is  used  to  represent  all  program  elements  internally.  Function 
definitions  and  data  items  are  both  stored  as  linked  lists  composed 
ultimately  of  atoms.  Even  language  primitives  are  atoms  like  any 
others,  distinguished  only  by  the  fact  that  they  are  defined  by  the 
systan  when  execution  begins. 

Furthermore,  the  value  of  a  symbolic  language  function,  like 
its  mathematical  counterpart,  is  determined  solely  by  the  values  of 

1  Recall  that  this  is  independent  of  any  consideration  of  how 
well  or  poorly  the  selected  algorithm  is  suited  to  the  problem, 
although  of  course  even  the  best  optimizer  cannot  compensate  for  an 
inefficient  algorithm. 


its  arguments.  This  property,  called  referential  transparency ,  has  a 
profound  impact  on  processing  organization.  At  a  given  point  during 
execution,  the  computation  to  be  performed  depends  only  on  curre.v. 
context,  not  on  the  history  of  actions  which  md  up  to  it.  The 
notions  ot  program  state  and  computation  sequence  are  therefore 
entirely  absent  from  the  applicative  environment. 

Since  program  symbols  no  longer  represent  Locations  in  the  data 
store,  the  fundamental  units  ot  pnxoxiural  language  programs  (expres¬ 
sions  and  assignments)  lose  their  power  in  the  applicative  setting. 
For  example,  the  instruction 

A  -  B 

in  the  sequential  environment  indicates  that  the  conter:  -  of  location 
B  should  be  retrieved  and  copied  to  location  A.  overwriting  any 
previous  value,  [n  the  applicative  environment,  an  instruction  of 
this  sort  is  viewed  as  establishing  a  definition  or  association 
betwf^en  the  values  of  A  and  B,  rath«r  than  performing  an  operation. 
Since  A  and  B  have  no  "locations"  £er  se ,  values  are  not  directly 
stored  or  copied,  a  trait  reflected  in  language  syntax,  which 
restricts  the  appearance  of  a  variable  to  the  lefthand  side  of  only 
one  equation  per  program.  (Nets  hat  because  it,  is  the  notion  of 
definition  rather  than  cemoutation  which  is  implicit,  the  statement 
A=A+T  is  meaningless  in  the  applicative  environment.) 

Instead,  t he  newer  of  the  applicative  environment  derives  from 
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data  list  may  be  transform'd  into  a  program  list  and  vice  versa,  with 
few  if  any  restrictions.  Since  program  symbols  are  viewed  as  values, 
they  may  freely  utilize  non-consecutive  bounds  or  nrmerties  which  are 
established  and  altered  dynamically. 

In  keeping  with  the  notion  of  functional  organization,  the 
primary  control  mechanisms  are  purely  applicative;  functional  compo¬ 
sition  and  recursion.  Even  the  conditional  construct  is  generally  a 
simple  variant  of  the  protection  function,  whereby  the  value  returned 
is  the  first  one  crmputable  by  the  terns  of  the  construct.  Operations 
on  data  itffns  are  considered  to  have  locality  of  effect .  that  is,  they 
produce  new  items  rather  than  altering  existing  ones.  As  we  shall 
see,  this  has  important  implications  for  paral leli.«*n. 

An  immediate  consequence  of  the  functional  approach  to  nrobban 
solution  is  that  almost  no  von  Neumann  hardware- related  features  can 
be  used  directly  by  the  applicative  environment .  If  the  underlying 
architecture  is  sequential,  program  execution  must  be  simulated  by  one 
or  more  extra  abstract  computers  that  indirectly  interpret  applicative 
actions,  using  sequential  software  routines  which  can  be  translated 
directly  into  machine  primitives.  Figure  9  illustrates  the  configu¬ 
ration  of  system  support  layers  typical  to  most  symbolic  language 
implementations.  Figure  10  represents  the  configuration  when  the 
underlying  architecture  is  non-snquential .  In  this  case,  much  of  the 
software  simulation  can  he  replaced  by  a  firmware  interpreter  that 
translates  software  primitives  directly. 
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Figure  9.  System  Support  Layering  in  Applicative 
Environments  Hosted  on  Sequential  Processors 


Below  t he  appl  ications  program  machine  is  an  infegratKl  language 
environment ,  which  supplies  debugging  ai  is  and  perhaps  other  develop¬ 
ment  support  tools.  Its  output  is  in  the  torn  of  the  primitives  used 
by  general  language  support  facilities,  primarily  v-e  irterpreter  and 
the  garbage  coll^-tor.  -H  the  next  layer,  each  processing  primitive 
is  represented  by  a  software  routine  v/hich  simulates  fhe  basic  opera¬ 
tions  .such  as  functional  invocations,  data  referer.c ir>y, ,  and  storage 
allocation  and  deallocation.  It  is  at  this  point  'hat  the  distinction 
tier  ween  sequential  and  non-sequential  hosts  becomes  apparent.  In  most 
con  f  i  gu  ra  t  i  ons ,  she  central  control  mechanisms  mu-.*  b  simulated  by 
means  of  a  software  subroutine  which  effectively  converts  execution 
from  applicative  to  sequential  form;  f rem  this  point  the  abstract 
machines  will  be  identical  to  those  of  the  sequential  environment 
depicted  in  Figure  R.  It,  instead,  the  target  execution  system 
employs  a  symbolic,  processor,  the  primitives  are  translated  via 
microcode  at  the  firmware  support  level. 

It  should  he  dear  that  a  primary  cause  of  inefficiency  in  most 
existing  applicative  environments  is  tin*  excessive  amount  of  simu¬ 
lation  required.  The  configuration  of  Figure  3  suffers  not  only 
because  of  the  inordinate  number  of  transformations  which  must  be 
apolic'd,  hut  also  from  the  difficulties  inherent  in  interpreting 
non-sequen  -  a!  i  r  s' rue  '  i  ons  in  sequential  form.  The  situation  is 
further  exaenrhurod  bv  the  exigencies  of  late  funding;  even  primitive 
arithmetic  operations  must  be  "simulated"  to  the*  extent  of  performing 
pm-time  tvpe  ch'-  ks  and  conversions. 


3.5  Optimization  in  the  Applicative  Environment 


Since  the  concept  of  program  state  is  missing  in  the  applicative 
environment,  optimizations  based  on  normal  control  flow  and  data  flow 
analysis  are  inappropriate.  Unfortunately,  few  optimizing  trans¬ 
formations  have  been  developed  specifically  for  the  applicative 
environment.  This  can  be  blamed  to  a  great  extent  on  lack  of  demand, 
due  partly  to  the  lack  of  popularity  of  symbolic  programming  prior  to 
the  mid  1970s  and  partly  to  the  habitually  casual  approach  to  effi¬ 
ciency  taken  by  the  artifical  intelligence  community  as  a  whole.  it 
is  to  be  hoped  that  the  recent  interest  in  real-time  AI  applications 
will  result  in  new  developments  in  this  area. 

The  simplest  way  to  achieve  significant  performance  improvunents 
is  to  take  advantage  of  the  compilation  facilities  offered  by  many 
applicative  implementations.  A  compiled  function  is  transformed  into 
instructions  corresponding  to  those  produced  by  the  language  primitive 
abstract  computer.  When  the  function  is  invoked,  only  the  firmware 
interpreter  (or  its  multilayer  software  equivalent  in  the  case  < 1 1  a 
sequential  host)  is  needed  to  complete  the  translation  process.1  Al¬ 
though  this  cannot  he  considered  an  optimization  technique  in  the 
strictest  sense,  it  is  the  only  means  of  improv«nent  available  in  many 


1  It  is  not  often  possible  to  entirely  eliminate  high-level 
interpretation,  however,  since  most  configurations  require  that  even 
compiled  functions  be  activated  through  the  language  environment  layer 
rather  than  by  direct  user  invocation. 


sy stuns. 


Like  their  sequential  oou  iter;;  arts,  symbolic  language  opt  ini  20  rs 
make1  several  assump c i  onn  .'.bout  the  nature  of  i  no; it  programs;  dev  1  at  von 
from  thps<'  norms  impedes  or  precludes  improvoment .  On  ♦'in?  basis  ot 
locality  of  effect.  -:ata  dependencies  are  considered  to  be  localized, 
i.e.,  subprogram  units  have  no  side  effect.-;,  .since  the  basis  control 
mechanisms  are  functional  composition  and  recursion,  programs  are 
assumed  to  be  made  up  of  a  large  number  ol  small,  often  recursive 
units.  This  means  that  a  substantial  portion  of  oxi  cation  time  is 
d'  vot.ed  to  activating  the  linkages  between  units  1  mally,  late 
binding  is  quintessential  in  the  applicative  environment.  Static 
analysis  cannot  suffice  to  associate  symbols  with  ati  h. bates  since  the 
properties  of  both  functions  and  data  items  may  be  created,  redefined, 
or  destroyed  at  arbitrary  points  during  execution.  Instead,  a  heap 
area  must  be  maintained  in  which  storage  may  be  allocated  and 
h  allocated  in  a  relatively  unstructured  bvshion. 

Program  execution  in  the  applicative  environment  can  perhaps 
best  be  seen  a«  the  alternation  ot  two  activities,  substitution  and 
simplification  Substitution  (unfolding)  refers  to  the  action  of 
replacing  a  p:  oi,  <•  r.  symbi  1  by  its  deficit  i«,n.  ’"his  is  followed  by 
simp  1  1  f  icat  1 .  u,  «;  ov*,  nation) ,  which  replaces  the  definition  by  the 
rr  sii  !  !  obtain*^)  t h rough  evaluating  the  •> <nv  of  the  definition.  In  the 
ease  of  atoms  and  data  lists,  si  mol  1  ••  oration  is  trivial,  since  the 
value  is  obtairv-d  through  a  search  of  the  storage  area.  For  func¬ 
tions.  simplification  will  require  the  application  of  additional 


substitutions  and  simplifications,  perhaps  recursively . 


Appendix  B  describes  the  optimization  techniques  which  have  been 
developed  for  applicative  environments.  An  atter.pt  is  also  made  to 
evaluate  the  results  of  studies  on  the  effects  of  applying  such 
transformations.  It  is  quite  difficult  to  characterize  the  features 
of  the  applicative  processing  environment  having  greatest  influence  on 
optimization  potential,  other  than  the  presence/absence  of  the 
applicative-to-sequential  transformation.  The  paucity  of  behavioral 
data,  coupled  with  the  disturbing  nature  of  available  results, 
relegate  such  efforts  to  pure  speculation. 

If  we  assume  that  current  techniques  address  the  real  probl<ms 
—  and  this  is  by  no  means  a  safe  assumption  —  then  improvement 
should  be  facilitated  by  the  concentration  of  numerical  eompu  tat  ions 
in  fewer  subprogram  units  and  a  reliance  on  purely  applicative 
constructs.  The  use  of  "special"  features  which  make  a  symbolic 
language  resemble  a  procedural  one  should  be  avoided,  narticularlv 
iterative  structures,  GOTO-like  transfers,  and  pathological  binding 
strategies.  The  use  of  type  declarations,  however,  should  be 
beneficial  since  it  would  facilitate  several  types  ot  improvements. 

Optimization  techniques  in  the  applicative  environment  rely  on 
the  following  assumptions:  (1)  a  homogeneous  internal  representation; 
(2)  the  treatment  of  program  elements  as  definitional  values  rather 
than  storage  locations;  (3)  a  subsequent  dependence  on  dynamic 
binding;  (4)  the  use  of  functional  composition  and  recursion  as  the 
primary  control  mechanisms;  and  (5)  a  corresponding  reliance  on  the 
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properties  of  referential  transparency  and  locality  of  effect.  The 
elimination  of  system  support  levels  by  moving  implementations  to 
non-soquent i al  processors  is  clearly  the  best  way  of  achieving 
significant  performance  improvement  given  current,  methodology.  The 
application  of  automatic  optimizations  must  be  viewed  with  some 
skepticism  until  their  effectiveness  is  demonstrated  by  empirical 
study . 
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4  A  Strategy  for  Maximizing  Optimization  Potential 


Previous  chapters  discussed  the  nature  of  program  performance 
and  the  ways  in  which  it  can  he  enhanced  in  typical  implementation 
settings.  As  we  have  seen,  the  types  of  improvi-monts  that  may  he 
applied  are  determined  by  the  characterist ies  of  the  processing 
environment.  Particular  optimizers  may  utilize  larger  or  smaller 
subsets  of  the  available  techniques  with  greater  or  lessor  eff»>o- 
tiveness,  but  the  limits  are  established  globally  by  the  environment 
itself.  Since  our  concern  is  with  the  efficiency  of  AI  algorithms  in 
the  implementation  solution  system,  it  follows  that  we  should  toeus 
our  attention  on  the  selection  of  the  processing  environment. 

The  ideal  environment  would  be  one  which  guarantees  optimal 
program  performance  in  all  cases.  Since  efficiency  and  optimization 
are  at  best  relative  terms,  this  is  patently  impossible.  It  is 
unlikely  that  any  single  configuration  can  predictably  maximize  tin* 
performance  of  even  a  small  subset  of  the  AI  problems  posed  by  BM/C** 
applications.  How,  then,  are  we  to  realistically  select  an  envi¬ 
ronment  for  the  optimization  of  an  arbitrary  AI  algorithm?  The 
sections  which  follow  establish  criteria  for  evaluating  processing 
environments  in  terms  of  their  responsiveness  to  program  needs.  A 
di vide-and-conquer  form  of  implementation  is  then  presented.  This 
strategy  partitions  a  program  into  segments  tor  processing  in  a  he¬ 
terogeneous  envirorment,  thereby  maximizing  program  "opt imizabi 1 i tv " . 
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4.1  Evaluating  Processing  Environments 

Any  process  which  relies  on  successive  transformations  falls 
prey  to  the  dangers  of  inefficiency  and  inaccuracy.  The  implemen¬ 
tation  system,  responsible  for  bridging  the  quite  considerable  gap 
between  conceptual  and  physical  solution,  is  undoubtedly  a  major 
source  of  performance  degradation  in  all  computing  systems.  The 
development  of  efficient  AI  programs  must  ultimately  depend  on  the 
capability  of  the  processing  environment  to  apply  automatic  im¬ 
provements  which  at  least  partially  compensate  for  this  introduced 
inef  f iciencv. 

One  criterion  for  choosing  a  processing  environment  is  the 
number  of  abstract  machines  in  the  system.  The  relationship  of  system 
support  layering  to  performance  has  already  been  addressed.  Clearly, 
the  best  layering  configuration  is  that  which  will  require  the  fewest 
number  of  translations  in  executing  the  program.  Relevant  considera¬ 
tions  include  the  number  of  layers  present,  the  availability  of  short- 
circuits  to  bypass  intermediate  levels,  and  the  fact  that  individual 
layers  do  not  necessarily  play  equal  roles  in  determining  the  overall 
effectiveness  of  the  system.  An  alternative  is  to  select  optimizers 
which  exploit  the  nature  of  a  particular  layering  configuration. 
Since  optimizing  transformations  are  never  applied  uniformly  across 
all  portions  of  a  program,  this  involves  assessing  the  relative 
likelihood  of  preconditions  tor  improvement  and  the  ability  of  trans¬ 
lating  me’chanisms  to  t.ake  advantage  of  short-circuit  channels,  as  well 
as  the  effectiveness  of  each  type  of  transformation.  It  is  obviously 


desirable  that  improvement  efforts  be  applied  at  all  levels  of  the 
implementation  to  achieve  optimum  performance.  The  two  approaches  can 
be  combined  in  an  approximation  of  the  minimax  game-nlaying  heuristic. 
The  goal  is  to  choose  the  system  configuration  which  combines  a 
minimum  number  of  abstract  machines  and  a  maximum  d»?gree  of  automatic 
improvement.  Such  a  selection  sets  an  upper  bound  on  the  degree  of 
efficiency  attainable  by  an  arbitrary  program,  but  it  cannot  guarantee 
a  lower  bound. 

Another  aspect  of  the  processing  environment  is  that  it  provides 
a  computational  paradigm  within  which  the  probl<>m  solution  must  be 
structured.  Here  we  find  that  the  sequential  and  applicative  environ¬ 
ments,  like  the  languages  which  naturally  express  each  paradigm, 
differ  radically.  The  procedural/sequential  approach  to  problem 
solution  concentrates  on  data  manipulation  and  alteration  through  a 
strictly  ordered  sequence  of  operations.  The  functional/applicative 
solution,  on  the  other  hand,  computes  by  value  rather  than  by  effect, 
so  the  program  description  focuses  on  relationships  instead  of  the 
wavs  in  which  they  are  computed. 

In  tnrms  of  computational  power ,  the  two  paradigms  are  ap¬ 
proximately  equal.  Sequential  processing  corresponds  to  the  Turing 
model  of  computation  with  its  clear  delineation  of  control  and  data. 
Program  instructions  are  encoded  in  an  immutable  store  and  selected 
for  execution  by  means  of  simple  sequencing  or  conditional  transfers; 
operations  can  examine  or  alter  the  contents  of  the  mutable  data 


store.  Associated  with  each  action  is  a  program  state,  which  encap- 


sulates  the  history  of  the  computation  to  that  point  and  determines 
the  next  action  to  be  taken.  Functional  processing  approaches 
computing  from  the  standpoint  of  recursion  theory.  In  this  case, 
control  and  data  eluents  are  treated  homogeneous1  v  as  values. 
Execution  is  a  process  of  functional  evaluation,  s<>piu  need  by 
functional  conposi  t  ion  and  recursion;  the  next  computation  is  thus 
determined  bv  context  rather  than  history.  The  class  of  problems 
computable  by  means  of  recursion  theory  is  not,  strictly  speaking,  as 
general  as  that  described  by  the  Turing  model.  Since  the  differences 
are  pathological,  however,  we  can  vi(>w  the  two  as  equivalent  for  the 
implementation  of  AI  algorithms. 

The  types  of  optimization  appropriate  to  each  uvironment  have 
been  discussed  in  considerable  detail.  Unfortunately,  it  is  impos¬ 
sible  to  compare  the  two  impartially  in  terms  of  performance  since 
published  findings  are  vague  and  self-contradictory.  Figure  11 
illustrates  a  single  benchmark  observed  by  [Gabriel  85]  on  a  variety 
of  systems.  The  results  are  totally  inconclusive.  Few  details  are 
explained  in  the  report  (e.g.,  details  of  comparative  machine 
configurations  and  some  of  the  options  are  not  described)  and  the 
experimental  conditions  do  not  survive  close  scrutiny.  Furthermore, 
the  Tak  benchmark  itself  is  of  q  lest’onable  utility  since  it  involves 
some  64,000  recursive  calls  and  48,000  decrements  but  nothing  more.* 

1  That  Gabriel  was  not  purport ing  to  compare  LISP  to  other  lan¬ 
guages  is  iranaterial  the  findings  for  individual  LISP  systems  are 
subject  to  the  same  Lack  of  coherence. 
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Figure  11.  Gabriel's  Tak  Benchmark 
Comparing  Lisp  with  Procedural  Languages 
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Statistically  relevant:  data  ci/nparmg  the*  two  environments  is  simply 
not  available  at  anv  level. 

The  express ;  ve  power  afforded  by  the  two  envi  rorm-nts  is  ai.-o 
difficult  to  compare,  primarily  because  the  ccnpdst  i- mal  paradigms 
are  essentially  incomparable.  In  general  ,  the  svmbo-’e  languages 
allow  a  simnler,  more  elegant  description  of  p;ohl<iv.s.  Output  can  be 
clearly  express«xi  as  a  function  of  input  and  a  single  aig<  rithm  can  be 
uniformly  applied  to  individual  data  objects  or  en  ire  classes  of 
then.  Procedural  languages,  >n  the  other  hand,  are  more  natural  to 
most  programmers.  Their  form  is  familiar  and  rr'l^ls  the  human 
tendency  to  view  pro’.ior  solving  as  a  sequence  of  actions.  The 
selection  of  an  environment  on  the  basis  of  this  ■  itcrion  must 
ultimately  depend  on  the  problem  to  be  solved.  Seme  problems  are 
inherently  sequential  and  others  inherently  recursive;  it  is  as 
difficult  to  express  the  former  in  terms  of  an  applicative  solution  as 
it  is  to  describe  the  latter  sequentially. 

Representat  mnal  power  describes  the  suitability  of  the  en¬ 
vironment  for  implementing  AI  programs.  The  general  objective  of 
artificial  intelligence  is  to  encode  knowledge  about  some  domain  and 
chen  use  that  knowledge  to  solv-  problems  in  the  domain.  The 
environment  selected  must  provide  suitable  means  for  encoding 
information,  retrieving  data  that  is  r -levari  to  the  problem,  and 
determining  •••  sat isfai  tnrv  solution.  Furthermore,  the  system  must 
conform  to  software  eng: nearing  standards  for  verification,  main¬ 


tenance.  and  so  forth.  The  procedural  languages  are  criticized  for 


their  emphasis  on  calculations  rather  than  fundamental  relationships 
and  the  inherent  difficulty  of  proving  program  correctness.  The  major 
complaints  against  symbolic  languages  are  the  convolutions  necessary 
to  express  simple  sequencing,  the  lack  of  applicability  of  standard 
testing  techniques,  and  the  intrinsic  inefficiency  of  heap  storage 
management.  Overall,  the  emphasis  given  to  problem  relations  makes 
the  applicative  environment  somewhat  more  appropriate  for  representing 
typical  AI  problems.  The  sequential  environment,  on  the  other  hand, 
is  better  suited  to  the  application  of  software  engineering 
methodology . 

In  summary,  neither  the  sequential  nor  the  applicative  envi¬ 
ronment  posesses  an  undisputed  superiority  for  the  efficient  im¬ 
plementation  of  AI  algorithms.  Each  approach  has  inherent  strengths 
and  weaknesses  which  can  have  significant  impact  on  the  performance  of 
largescale  software  systems. 

4.2  The  Environment  Spanning  Strategy 

The  AI  algorithms  needed  for  the  BM/C^  setting  can  be  cat¬ 
egorized  in  general  terms  as  search,  reasoning,  or  constraint 
satisfaction  problems.  A  search  algorithm  conceptually  views  the 
entire  solution  space  and  attempts  to  find  a  suitable  path  through  it 
(e.g.,  track  discrimination  problems).  A  reasoning  algorithm 
accumulates  data  by  deducing  it  from  previous  truths  and  adds  it  to 
the  knowledge  base  for  future*  deductions  (e.g.,  attack  assessment 


sensor  measurement  processing  and  precision  track  and  discrimination 
functions;  it  includes  color  correlation,  sran-to-scan  correlation , 
single  target  processing,  discrimination,  i rradiance  cal  ibration.  and 
stellar  attitude  update  routines.  The  OTI  study  concentrated  on  the 
area  of  scan-to-scan  correlation,  where  it  was  felt  that  the  greatest 
improvements  could  be  realize  by  reformulating  the  Nichols  algorithms 
for  applicative  processing.  Benchmark  results  were  compared  for 
versions  in  Pascal  and  LISP  on  a  VAX  11/780.  Pascal  out-performed 
LISP  in  data  storage  tasks  (in  spite  of  the  fact  that  there  was  some 
pro-LISP  bias  in  the  data  structures  chosen  for  the  programs')  ,  while 
LISP  was  faster  at  windowing  transformations.  The  OTI  study  consid¬ 
ered  these  to  be  mixed  results,  an  understandable  reaction  in  view  of 
their  stated  desire  to  demonstrate  the  superiority  of  symbolic 
processing  in  situations  requiring  the  dynamic  correlation  and  undat¬ 
ing  of  large  amounts  of  data. 

From  our  viewpoint,  however,  this  case  typifies  a  problem  common 
to  most  AI  programs  in  real-time  settings.  Some  of  the  subtasks 
involved,  such  as  I/O,  sorting,  numerical  calculation,  and  storage 
operations  represent  exactly  those  operations  which  arc*  intrinsically 
suited  to  the  sequential  paradigm.  Others,  such  as  pattern  matching 
and  discrimination ,  intuitively  fall  into  the  realm  of  applicative 
processing.  Each  system  performs  well  when  most  processing  is  of  an 
appropriate  type,  and  each  can  be  overwhelmed  when  subjected  to  large 
amounts  of  unsuitable  activity.  Most  programming  languages  appear  to 
provide  for  both  sequential  and  applicative  activities  bv  incorpo- 


rating  syntactic  features  mirroring  those  of  their  counterparts. 
3i  nee  the  two  paradigms  are  so  radical  ly  d  it  inrent,  however,  the 
resemblances  are  strictly  superficial. 

To  demonstrate  the  important  role  played  by  computational 
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suitability  in  setting  an  unper  bound  on  norforrancr,  researchers  at 
Auburn  University  extended  the  OTI  routines.  First,  i  L  wis  concluded 


that  the  existing  programs  did  not.  in  tact  particularly  exercise  those 
task  areas  tor  which  l, ISP  was  most  appropriate  (e.g.  ,  uniform  treat¬ 
ment  ot  program  and  nrobbn  data,  recursion,  and  properties  requiring 
dynamic  hinting).  The  manipulation  of  dynamic  property  lists  was 
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added  to  »he  original  program  to  oi'r"c;t  this  bias.  In  keeping  with 
*  tie  nat  it,  of  general  discrimination  activities,  ‘.hr-  property  lists 
w<Te  designed  to  represent  any  of  a  variety  of  supplementary  data 
gathered  sporadical  1 v  on  a  by-demand  basis  by  special -purpose  sensors, 
since  thi-'  information  does  not  apply  to  a  1  i  tracked  objects  and  is 
ipdafed  iniv  'xvavional lv ,  it  would  create  undesirable  burdens  on  data 
processing  if  incorporat'd  directiy  i  r.  the  main  data  store;  instead 
property  lists  are  allocated  dynamically  when  and  if  needed. 
pHr-tioiil.tr  on'  es  taker,  to  make  sure  that  the  benchmarks  were  as 
equivalent  ,.s  u,  sibie,  given  tne  uyntaet  e  .rd  semantic  limitations 
1 1  f f > ■  *  tw<i  language.-. 

The  resulting  m  t  i'erences  ir;  performance  were  as  expected.  LISP 
lear’.v  out -pert  imerl  Pascal  when  dynamic  eapahil  ities  were  required, 
bu*  fid  ;x»»rlv  when  a  s.quent  u-.  I  paradigm  w-.s  more  appropriate.  Track 
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nifiafion  is  another-  KAS  situation  where  symbolic  processing  should 


emerge  a  clear  winner  since,  as  the  OTI  study  pointed  out,  considera¬ 
ble  savings  in  calculation  could  be  realized  it  closely  clustered 
objects  can  be  treated  as  a  single  object  for  analytical  purposes 
until  such  a  time  as  their  tracks  diverge.  On  the  other  hand,  compu¬ 
tation  intensive  ta^cs  such  as  discrimination  and  calibration  would 
undoubtedly  perform  better  as  sequential  implementations. 

The  solution  is  clear:  real-time  AI  algorithms  should  be 
developed  for  an  implementation  system  which  allows  a  true  combination 
of  sequential  and  applicative  processing.  We  therefore  propose  a 
strategy  that  utilizes  available  methodology  to  combine  the  two 
computational  paradigms.  Since  an  implementation  of  this  type  must 
clearly  bridge  two  distinct  environments,  the  strategy  is  referred  to 
as  environment  spanning.  Although  it  represents  a  snrmwhat  radical 
departure  from  the  typical  implementation  system,  the  description  and 
execution  systems  remain  unaltered  and  no  special  equipment  or 
techniques  are  necessary  to  effect  the  change. 

Envirorment  spanning  begins  during  the  initial  program  design 
phase,  when  the  implementation  system  is  originally  selected.  The 
problem  solution,  rather  than  being  expressed  uniformly  in  terms  of  a 
single  paradigm,  is  partitioned  into  groups  of  subtasks  suited  to 
sequential  and  applicative  processing.  The  criteria  tor  assigning  an 
activity  to  an  environment  will  normally  be  those  established  in 
previous  sections,  although  they  do  not  preclude  the  possibility  <>f 
utilizing  algorithms  or  modules  which  have  already  been  implemented 
according  to  one  paradigm  or  the  other.  The  software  components  from 
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the  two  env i  ronmoMs  are  ultimately  interfaced  using  one  of  the 
envirorment  spannirig  methods  doscrtbrv!  below. 


4.3  Environment  Spanning  Imol 'mentations 

Environment  spanning  represents  a  type  of  heterogenous  comput- 
i  ng  environment ,  that  is,  a  configuration  in  which  dissimilar  hardware 
and/or  software  systems  function  cooperatively.  The  exponential 
increase  in  available  hardware  and  software  components  during  the  past 
two  decades  has  inspired  the  development  of  a  number  fit  heterogeneous 
environments,  but  r.  st  efforts  nave  been  devoted  to  interfacing 
physical  elements  rather  than  software  facilities.  Since  environment 
spanning  affects  only  the  implementation  system ,  the  actual  hardware 
configuration  is  irrelevant.  What  is  essential  is  some  means  of 
coordinating  sequential  and  applicative  processing  components. 

The  key  to  exactly  what  type  of  cooperation  is  needed  can  be 
found  in  the  nature  of  the  structural  frameworks  imposed  by  the  two 
paradigms.  Symbol  c  programs  are  effectively  limited  to  evaluation 
activities,  which  restricts  the  appl  icative/sequential  interface  to 
the  passing  of  functional  values.  Since  multivalued  functions  are  not 
supported  i  .  tine  proc.xmr*fti  paradigm,  trus  is  further  restricted  to 
the  passing  if  single  values.  Cltarly,  what  is  needed  in  each  of  the 
two  spanned  “n” i ronment s  is  some  means  of  r*motely  activating  a 
component  of  th<  other  and  later  receiving  a  returned  value.  There 
ar<>  three  ma  jor  w  >ys  of  achieving  this. 
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The  most  straightforward  environment  spanning  method  is  the 
simultaneous  operation  of  sequential  and  applicative  systems  in  a 
parallel  architecture.  Figure  12  illustrates  thi  type  of  configu¬ 
ration;  the  arrangement  is  equally  suitable1  when  the  applicative 
envi ronment  is  superimposed  on  a  sequential  processor.  The  components 
are  developed  independently  on  the  two  processors  and  eventually 
exchange  signals  and  data  via  the  comnunication  channels  established 
for  the  parallel  systan.  At  least  two  symbolic  processor  manufactur¬ 
ers  are  currently  engaged  in  developing  architectures  which  combine 
symbolic  and  st>quential  processors  linked  bv  a  common  bus,  but  no 
systems  are  commercially  available  at  the  present  time.  For  this 
reason,  although  parallel  processing  is  conceptually  the  clearest  form 
of  environment  spanning,  it  cannot  be  considered  consistent  with  our 
objective  of  staying  within  the  confines  of  available  methodology. 

A  second  spanning  mechanism  is  implicit  in  multiprocessing  en¬ 
vironments  (for  our  purposes,  it  is  immaterial  whether  the  processes 
are  in  fact  executing  in  parallel  or  the  concurrence  is  simulated). 
Here,  as  depicted  in  Figure  13,  separate  tasks  are  created  re¬ 
presenting  the  sequential  and  applicative  environments .  Remote 
activation  and  the  passing  of  functional  values  are  handltvi  through 
the  semaphore/mailboxing  facilities  of  the  underlying  system. 
Unfortunately,  not  all  multiprocessing  environments  allow  the 
spontaneous  creation  of  applicative  tasks.  In  these  cases,  the  most 
likely  alternative  is  to  construct  the  applicative  environment  as 
though  it  were  the  only  means  of  processing;  then,  using  the 


intersequericing  method  describ'd  ri«xt,  invoke  a  sequential  module 
which  effectively  spawns  the  sec  >nd  environment.  This  would  allow 
concurrent  processing  on  a  simplified  lo'<--. . 

The  third  method  has  no  particular  system  rep,  1  remcuits  other 
than  the  ability  for  one  environment  ‘  i  evoke  modules  created  in  the 
other;  this  is  true  et  most  opera’.  I  ng  s\  -t'ms  w  iich  support  both 
symbolic  and  procedural  language  implementations.  I  ntersequenced 
environment  spanning  establishes  a  master  pny  essing  environment  which 
invokes  elements  of  the  siave  environment  wN-n  needed  (see  Figure  14). 
Restrictions  on  the  interfacing  between  symbol  i.  •  r.  i  procedural 
language  modules  in  most  op  racing  .systems  require  that,  the  master  be 
whichever  component  includes  a  language  envronment  layer,  normally 
the  applicative.  Under  VAX/VMS,  tor  example,  a  LISP  program  can 
directly  invoke  modules  written  ir*  either  symbolic  or  procedural 
languages,  but  no  procedural  language  can  invoke  LISP  modules  since 
there  is  no  way  of  establ ishing  the  needl'd  LISP  environment  layer. 

Which  of  the  three  solutions  is  selected  obviously  depends  on 
the  nature  of  the  available  hardware.  The  actual  implementation 
method  could  be  transparent  to  the  software  d  sigrer/ implementor  i f  an 
environment  spanning  interface  ure  developed.  This  would 
automatical1/  re  •'  >rmu  late  eomiruciraf.  i  ng  components  as  needed  to 
conform  to  the  sper  ific  rnmiri’wnls  of  the  spanning  mechanism. 

It  should  !>°  noted  that  environment  spanning  does  not  impose  any 
relative  balance  of  processing  on  the  two  systems  (although  within  an 
environment  more  typical  multiprocessing  may,  of  course,  be  going  on). 
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The  issue  hero  is  to  partition  program  activities,  not  in  order  to 


3 


distribute  processing  according  to  any  particular  pattern,  but  rather 
to  allow  the  expression  and  ultimate  execution  of  each  task  in  the 
environment  most  appropriate  to  it.  Optimization  potential  is  limited 
hv  a  combination  of  factors ,  each  related  to  the  computational 
paradigm  chosen  tor  the  implementation.  By  realistically  assessing 
the  characteristics  of  each  program  component  and  assigning  it  to  an 
appropriate  processing  environment,  the  designer/ implementor  maximizes 
the  effectiveness  of  optimization  efforts  throughout  the  implementa¬ 
tion  solution  systijn- 
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5  Conclusions 


The  stringent  constraints  imposed  on  computing  systems  in  the 
BM/C^  setting  have  forced  the  issues  ot  program  performance  on  the 
artificial  intelligence  community.  Can  AI  algorithms  he  impl emented 
efficiently  using  available  svstms  and  methodol<)gi**s?  The  answer 
requires  an  objective  assessnent  of  the  difficulties  inherent  in  the 
transition  from  problem  conceptualization  to  physical  reality.  The 
implementation  system,  which  provides  this  transition  through  succes¬ 
sive  interpretations  of  the  problem  solution,  has  a  profound  impact  on 
performance  that  simply  cannot  bo  ignored. 

The  immediate  effect  of  the  implementation  system  is  that  it 
establishes  a  processing  environment  within  which  the  problem  solution 
is  expressed.  Two  alternatives  are  currently  available,  sequential 
and  applicative.  They  present  conflicting  views  ot  the  underlying 
architecture  as  von  Neumann  or  not,  but  this  is  not  their  most 
important  difference.  The  essence  of  the  processing  environment  is 
that  it  establishes  a  computational  paradigm  which  shapes  the 
development  and  ultimate  performance  of  any  program  executing  within 
it. 

The  sequential  processing  environment  views  a  problem  solution 
in  terms  of  the  Turing  model  of  computation,  which  isolates  program 
control  from  data  values  in  immutable  and  mutable  stores,  respec¬ 
tively.  The  procedural  languages  provide  a  natural  expression  of  tins 


approach  bv  treating  program  symbols  as  references  to  storage 
locations  anri  limiting  processing  to  the  sequential  or  conriiti  nal 
execution  of  basic  operations-  In  terms  of  performance  improvement, 
the  factors  of  greatest  influence  are  early  binding,  which  permits  the 
propagation  of  values  on  a  local  and/or  global  level  to  minimize 
recemputat  ion ,  and  a  heavy  dependence  on  iteration,  which  allows  the 
movement  of  instructions  from  more  to  lesi  frequently  executed  por¬ 
tions  of  the  control  store.  The  ma/jor  obstacles  to  optimization  are 
the  dangers  posed  by  side  effects  and  aliasing  and  the  difficulty  of 
recognizing  potentially  parallel  operations. 

The  applicative  processing  environment,  on  the  other  hand, 
exemplifies  the  recursion  theory  approach  to  comput'd!; !  i  ty ,  viewing 
control  arid  data,  elements  uniformly  as  values.  A  symbolic  language, 
which  expresses  processing  in  terms  of  functional  composition  and 
recursion,  is  the  most  natural  descriptive  tocl  tor  this  model. 
Unlike  their  sequential  counterparts,  applicative  programs  permit  no 
side  effects  or  aliasing  and  possess  an  implicit  concurrency  which 
makes  th«m  admirable  targets  for  parallelization.  Their  dependence  on 
recursion  and  late  binding,  however,  seriously  hampers  other  attempts 
at  optimization. 

In  terms  of  the  implementation  of  AT  algorithms,  neither 
sequential  nor  applicative  processing  can  be  said  to  be  unequivocably 
superior  to  th>>  other.  Each  approach  is  inherently  suited  to  a 
specific  set  of  problems  and  inappropriate  for  others.  This  is 
re»]<*rtod  in  the  tact,  that  virtually  ail  existing  algorithms  were 


developed  within  a  particular  environment  and  that  transformations 
from  one  to  the  other  are  difficult  and  generally  inefficient.  The 
issue  here  is  not  the  promotion  of  particular  language's,  unplunen- 
tation  schemes,  or  computation  systems.  It  is,  rather,  that  the 
complex  real-time  systems  required  for  AI  processing  in  the  RM/C^ 
setting  combine  some  intrinsically  sequential  features  with  others 
that  are  intrinsically  non-sequential .  The  use  of  a  homogeneous 
implementation  system  is  like  choosing  a  broom  instead  of  a  shovel  for 
a  job  that  requires  both:  it  can  be  done,  but  not  efficiently. 

Spanning  dissimilar  envirorments  facilitates  program  imp rov eme > n t 
by  allowing  the  assignment  of  subproblems  on  an  individual  basis  to 
whatever  system  offers  the  best  chance  for  animated  optimization.  At 
the  same  time,  the  strategy  takes  into  account  the  fact  that  the 
number  and  speed  of  operations  may  have  less  effect  on  performance 
than  design  factors  such  as  how  pattern  data  is  conceptualized  or 
which  heuristics  guide  allocation  activities.  A  heterogeneous 
environment  also  maximizes  this  human  optimization  potential  hv 
allowing  the  designer/ implementor  to  express  each  problem  in  the  most 
natural  wav,  without  undue  concern  for  the*  execution  details  ot 
interacting  .solutions.  It  is  to  be  hoped  that  an  algorithm  which  in 
its  entirety  is  too  unwieldy  for  significant  performance'  improvement 
can  he  reduced  to  a  tractible  level  bv  this  d  i  v  i  do-and-ceinquer 
approach . 
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Appendix  A 

Optimization  Techniques  for  Sequential  Environments 


The  imnrov«Tnent  techniques  tyniiall/  applied  bv  optimizers  in 
the  sequent  it1  **n\  i  ronment  assume  a  bipartite  immutable /nut  able 
program  organization  as  discussed  in  Chapter  3.  A  related  issumption 
is  that  there  he  a  clear  distinction  between  symbols  referencing 
program  control  units  and  those  defining  storage  elements  (program 
lata  versus  problim  data).  Therefore,  with  few  rxcep*  -ons  procedural 
languages  require  that  user-defined  .symbols,  or  identifiers,  be  bound 
to  the  appropriate  location  in  the  control  or  data  segment  prior  to 
execution.  This  property,  which  allows  the  association  ot  symbols  to 
locations  through  a  static  analysis  of  the  program,  is  called  early 
binding.  *  and  is  a  ma.ior  difference  between  sequential  and  non¬ 
sequential  p roc ess i ng. 

Prior  to  the  application  of  optimizing  transformations,  control 
flow  and  data  flow  analysis  are  performed  to  establish  the  semantic 
framework  (or  "meaning")  which  must  be  preserved.  The  basic  semantic 
element  of  a  program,  which  vc  wili  call  a  logical  unit ,  is  a  maximal 
colleetioi.  .,f  instructions  in  a  textual  sequence  that  are  always 


*  "Early  binding"  is  used  here  to  indicate  the  pre-execution 
ability  to  associate  symbols  with  offsets  into  a  storage  area.  In  the 
strictest  sense,  local  variables  are  bound  dynamically,  since  the 
storage  area  itself  is  not  allocated  until  the  unit's  prologue  is 
activated. 
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executed  in  order  as  a  single  entity.1  This  means  tha t  no  transfer 
occurs  to  any  instruction  in  a  logical  unit  except  the  first  and  that 
once  the  first  instruction  is  executed  all  others  will  be  ex<x:uted  in 
sequential  order  prior  to  transfer  out  of  the*  unit.  A  program  is 
divided  into  logical  units  by  identifying  the  statements  which  serve 
as  entry  (branch-in)  and  exit  (branch-out)  points. 

In  control  flow  analysis,  the  program  is  partitioned  into 
logical  units  and  a  flow  graph  is  constructed  whose  nodes  are  the 
units  and  whose  arcs  represent  possible  flow  of  control  between  units. 
The  nodes  are  then  grouped  to  form  logical  intervals,  which  represent 
sequential  series  of  units  dominated  by  a  single  entry  node,  but 
possibly  having  multiple  exits.  The  importance  of  logical  intervals 
is  that  they  may  be  "reduced”  to  single  nodes  to  form  a  new  flow 
graph.  This  is  then  partitioned  into  new  logical  intervals  which  are 
subsequently  reduced,  and  so  forth,  allowing  the  analysis  of  —  and 
hence  the  application  of  optimizing  transformations  to  —  successively 
larger  portions  of  the  program. 

Data  flow  analysis  is  concernt'd  with  the  legitimate  configura¬ 
tions  of  the  data  segment.  Since  D  can  be  represented  adequately  by 
recording  the  changes  brought  about  in  each  transition  from  Dn  to 
Dn+i,  the  primary  unit  for  data  flow  analysis  is  the  state  vector, 
which  lists  those  data  elements  whose  contents  are  altered  by  the 

1  Logical  units  are  referred  to  elsewhere  by  a  variety  of  nam**s, 
including  basic  blocks,  logical  blocks,  and  control  groups. 


instruction  corresponding  to  th  if  state.  Consolidated  state  vectors 
ran  be  used  to  represent  the  data  segment  associated  with  each  logical 
unit  or  ;  nferval  of  the  progrrm-  !  oca  I  data  flow  analysis  cor,,iv 
r rates  on  the-  state  vectors  of  adjacent  logical  units,  while  global 
analysis  correlates  alterations  to  the  data  configurations  which  occur 
torn  one  interval  to  another. 

The  sections  which  mliow  describe  the  r>.  r  forma  no-  improvement 
techniques  that  have  boon  developed  lor  use  :n  sequential  processing 
environments.  The  optimizing  transformations  typically  applied  to 
procedural  programs  are  categorized  according  to  the  go:  'ral  type  of 
analysis  required  f<'r  implementation.  This  is  followed  by  a  section 
addressing  the  question  of  how  much  performance  can  he  expected  to 
improve  through  the  application  of  sequential  optimizations. 

A . 1  Typical  Optimization  Techniques 

The  optimizations  which  are  outlined  in  the  following  pages  all 
presuppose  at  least  a  minimal  level  of  control  ami  data  flow  analysis. 
For  convenience,  they  are  grouped  into  four  general  categories1:  (1) 
expression  simp  l i f i cation;  (2)  code  rearrangement ;  (3)  optimization  of 
data  storage  arms:  (•*'  target-spei  i*ic  optimizations.  Examples  are 
given  of  some  of  the  mist  common  techniques.  Although  they  are 
I 

2  Although  there  is  no  standard  terminology  for  optimization 
’  techniques,  an  attempt  has  lyre  made  here  to  correlate  terms  from  a 

!  variety  of  sources. 

I 

) 
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portrayed  using  a  sort  of  pidgin-Pascal,  it  should  be  clear  that  most 
of  the  transformations  are  independent  of  any  particular  language 
structure  or  level  of  implementation.  In  general,  individual 
techniques  may  be  applied  at  the  local  and/or  global  level;  the  degree 
of  analysis  required  for  global  optimizations,  however,  constrains 
thetr  use  in  many  settings. 

The  first  class  includes  some  of  the  most  widely  applied 
techniques  (see  Figure  15).  Intuitively,  expression  simplification 
deals  with  improvements  to  the  way  in  which  numerical  computations  are 
specified.  It  encompasses  a  large  subclass  of  transformations  known 
as  "constant  folding"  (also  called  compile-time  computations  or 
constant  expression  evaluation);  the  attempt  to  perform  operations 
whose  operands  and/or  results  are  known  at  compile  time  because  they 
involve  numerical  constants.  A  second  type  of  improvement,  "common 
subexpression  elimination"  avoids  the  re-computation  of  values  already 
calculated  for  some  earlier  operation.  "Strength  reduction"  sub¬ 
stitutes  "weak"  operations  for  "strong"  ones  to  improve  execution 
speed  and/or  make  possible  further  optimizations;  it  includes  attempts 
to  reduce  the  processing  needed  to  calculate  array  offsets.  "Subex¬ 
pression  reordering"  takes  advantage  of  commutative  and  associative' 
properties  and  algebraic  identities  to  reduce  temporary  storage  ma-ds 
and  to  facilitate  other  transformations.  Finally,  "value  propagation" 
eliminates  or  minimizes  the  need  for  storage  transfers  by  replacing 
references  to  an  identifier  name  by  references  to  its  value. 

The  concept  of  early  binding  is  obviously  crucial  to  these 


Technique 

constant  folding 


common  subexpression 
elimination 


Example 


CONST  A  -  5 
B  ( 1  0* A) -5 


strength  reduction 


subexpression  reordering 


value  propagation 


9  45 _ 

IF  A*B  >  10 
THEN  C  A»B 
ELSE  C:-  -  (A*B) 


temp  A*8 
IF  temp  >  10 
THEN  C  temp 
ELSE  C  : -  -tenp 

A  A*  1 6 


A  shift (A, left, 4) 

A  A*D*B 

C  2* (B*D) 


temp  D*B 
A  A'temp 
C  shift (temp, left,  1) 

A  10*(B/2)  ”” 

C  B 

C  B*  5 


A, C  B*5 


Figure  15.  Examples  of  Expression 
Simplification  Techniques 
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techniques.  An  optimizer  which  applies  expression  simplifications 
must  perform  a  detailed  analysis  of  state  vectors  to  determine  when  a 
value  is  altered,  as  well  as  incorporate  mechanisms  for  associating 


algebraic  properties  with  individual  instructions.  A  specific 
transformation  must  normally  be  applied  more  than  once  and  in 
alternation  with  other  techniques  to  be  truly  effective. 

The  category  of  code  rearrangement,  as  its  name*  implies, 
encompasses  transformations  designed  to  improve  the  ordering  of 
instructions  in  the  program's  code  segment;  some  common  examples  are 
illustrated  in  Figure  16.  The  movenent  of  invariant  expressions  out 
of  loops,  reordering  of  independent  operations  (statement  flipping), 
elimination  of  induction  variables,  and  hoisting  of  array  offset 
calculations  from  inside  loops  all  represent  optimizations  commonly 
referred  to  as  "code  motions".  These  techniques  attempt  to  minimize 
the  frequency  with  which  a  given  operation  is  performed  and  are 
particularly  important  when  a  large  number  of  array  reference  are 
used,  since  offset  calculations  normally  require  costly  multiplication 
operations.  "Loop  reorganizations"  (linearization,  fusion,  and 
unrolling)  reformulate  loop  structures  by  fully  or  partially  expanding 
them  to  sequential  form  in  order  to  minimize  the  number  of  tests  and 
branches  needed  to  control  iteration.  "Boolean  minimization"  performs 
a  similar  function  by  reordering  comparisons  in  order  to  minimize 
testing. 

Other  types  of  code  rearrangement  are  difficult  to  depict 
graphically.  Code  elimination  techniques  remove  redundant  instruc- 
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Technique 


Example' 


coda  motion 


loop  reorganization 


boolean  minimization 


FOR  I  1  "0  N  DO  BEGIN 
X  MNJ 
MI]  ABS(BCI) 

END 

4 

X  A ( N  J 

fOP.  I  1  TO  N  DO  BEGIN 

A  [15  ASS'S  rij) _ 

FOR  I  1  TO  10  DO  BEGIN 

J  1+4 

X  [  r  ]  B [ 1 , J  j 

Til]  B[l,J-2] 

END 

i 

FOR  offset  :■  0  TO  36  BM  DO  BEGIN 
[X+of f set]  (B+of f  set +  1 6 ] 

(Y+offset)  [B+offset+8] 

END _ 

FOR  I  1  TO  10  DO 

FOR  J  1  TO  10  DO 
FOR  K  1  TO  10  DO 
READ (X  [  I ,  J,  X] ) 

J 

FOR  offset  0  TO  996  BY  4  DO 

READ ( [x+off set! ) _ 

IF  A  AND  (B  OR  C) 

THEN  X  10 
ELSE  IF  B  OR  C 
THEN  X  0 
ELSE  X  -10 

l 

IF  B  THEN  GOTO  LI 
IF  C  THEN  GOTO  LI 
X  -10;  GOTO  L3 
LI  :  IF  A  THEN  GOTO  L2 
X  C;  GOTO  L3 

L2-  X  ?C 

_Hi_ _ 


*  The  use  c(  ‘of'set’  is  an  attempt  to  indicate  the  calculation  of  array 
subscript  offsets;  a  size  of  4  bytes  per  element  is  assumed 


Figure  16.  Examples  of  Code 
Rearrangement  Techniques 


tions,  such  as  the  assignment  of  a  value  to  a  variable  which  is  not 
referenced  until  after  a  further  assignment,  or  unreachable  (dead) 


code  positioned  after  a  branch-out  point  but  before  a  corresponding 
branch-in.  These  transformations  are  often  necessary  after  the 
application  of  other  improvements.  Because  of  the  high  run-time 
overhead  associated  with  the  prologue  and  epilogue  code  of  procedure 
units,  some  optimizers  also  perform  procedure  integration  (in-line 
substitution),  which  replaces  each  occurrence  of  an  invocation  by  a 
copy  of  the  instructions  forming  the  body  of  the  subprogram. 

The  third  class  of  improvements,  data  storage  optimizations , 
reorder  the  data  segment  to  minimize  program  space  riqui  rements.  The 
information  encapsulated  in  the  state  vectors  allows  the  identifi¬ 
cation  and  elimination  of  useless  variables,  such  as  unreferenced 
identifiers  or  those  rendered  extraneous  through  constant  propagation 
or  other  optimizations.  Live  variable  analysis  reveals  the  effective 
span  of  individual  identifiers  so  that  variables  with  disjoint  life¬ 
times  may  be  overlaid  in  the  same  storage  location.  Time  improvements 
can  also  be  realized  by  reorganizing  data  elements  to  preserve 
boundary  alignments  which  result  in  more  efficient  data  access  or  to 
take  advantage  of  reference  adjacencies  in  order  to  minimize  rmmory 
page  faults.  pinally,  when  performed  in  conjunction  with  expression 


simplification,  storage  analysis  allows  the  replacement  of  run-time 
assignments  of  constant  values  by  static  (compile-time)  initialization 
of  the  storage  locations. 


niquos  since  they  a re  often  directly  incorporated  in  translation 
mech.anisns.  These  include  the  use  of  improvement  algorithms  in  such 
activities  as  register  allocation,  target  instruction  generation,  and 
instruction  scheduling.  "Peepholp  optimization",  apoiioa  during  the 
last  stages  of  target  code  generation ,  analyzes  short  sequences  of 
c<xie  and  attunnts  to  reorder  or  eliminate  instructions.  For  example, 
multiple  instructions  such  as  cascaded  branches  or  reoundant  condition 
tests  can  be  combined  into  a  single  operation  having  the  same  effect. 
The  substitution  of  target-specific  instructions  which  are  shorter  in 
format  or  ex'vute  faster  is  also  of  value  at  this  lev» 1 . 

In  summary,  a  wile  range  of  techniques  has  been  established  for 
optimizing  programs  in  the  sequential  processing  environment.  Unfor¬ 
tunately.  some  of  the  techniques  are  self-defeating  —  if  not  actually 
contradictory  —  when  used  in  combination  with  others.  An  optimizer's 
of { fxit i von°ss  depends  to  a  great  extent  on  the  successful  interplay  of 
a  variety  of  techniques.  Most  existing  versions  limit  their  activi¬ 
ties  to  a  relatively  small  number  of  transformations  sharing  similar 
analysis  needs  and  having  significant  impact  on  whatever  types  of 
input  programs  are  dimmed  typical. 

A. 2  Potential  fo-  Optimization 

Although  optimising  compilers  of  varying  degrees  of  sophisti¬ 
cation  have  b<H-n  available  for  twenty  years,  there  are  few  empirical 
studies  indicating  to  what  degree  they  can  improve  performance.  As 


mentioned  in  Chapter  2,  the  first  difficulty  is  establishing  .iust  what 
to  measure  —  i.e.,  what  constitutes  an  average  program,  average 
run-time  load,  average  input,  etc.  A  second  probbm  is  how  to  isolate 
the  effects  of  a  particular  improvement  when  the  interaction  of 
transformations  is  so  critical  to  their  success.  TYie  issue  is  further 
complicated  by  the  general  inadequacy  of  available  methods  for 
measuring  run-time  behavior  and  the  unintelligibility  of  the  results. 
In  short,  the  literature  is  full  of  references  to  optimization  t«ch- 
niques  but  there  is  a  noticeable  lack  of  correlation  between  theory 
and  practice,  and  few  statistically  significant  findings. 

Knuth  made  the  first  attempt  at  compiling  program  statistics 
when  he  compared  FORTRAN  code  written  by  Stanford  students  with  that 
of  Lockheed  programmers,  using  both  static  and  dynamic  analysis 
techniques.  This  study  [Knuth  71]  ranains  one  of  the  most  extensive 
to  date,  but  the  results  are  of  questionable  use  because  of  the  heavy 
bias  due  to  the  syntax  of  early  FORTRAN.  [Elshoft  76]  and  [Sarraga 
84]  performed  similar  analyses  of  General  Motors  programs  written  in 
PL/I,  while  [Robinson  75]  and  (Zelkowitz  761  provide  the  best  examples 
to  date  of  academic  programs  (written  in  FORTRAN  and  PL/I,  respec¬ 
tively).  Since  reasonably  scaled  analyses  of  other  programming 
languages  are  not  available,  only  those  findings  relevant  to  generally 
applicable  optimization  techniques  will  be  cit<xl  here. 

The  mostly  widely  quoted  statistic  is  Knuth 's  "90/10  rule", 
which  stated  that  90%  of  total  time  was  spent  executing  just  10%  of  a 
program's  statements.  Input/output  operations  were  found  to  consume 
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an  inordinate  sharp  of  processing.  with  5?  of  the  c-  cio  accounting  tor 
more  than  25%  of  measured  tune.  !n  terns  ot  non-I/O  code,  under  4%. 
occupied  50%  of  execution.  These  figures  imply  that  significant 
improvement  might  be  realized  if  optimizing  efforts  can  be  concen¬ 
trated  in  the  proper  areas. 

Some  of  the  statistics  provided  by  static  analyst:-,  art'  less 
encouraging.  fKnuth  711  found  that  68%  of  assignment  statements  were 
simple  replacements  which  copied  a  value  from  one  location  to  another; 
these  results  were  confirmed  bv  [Elshoff  761,  who  reported  77.6%  and 
40% ,  respectively.  The  same  sources  cite  an  additi  -nr  1  .i2%,  21%,  and 
29%  as  assignments  involving  the  evaluation  of  no  more  than  one 
operator.  In  terms  of  optimization  potential,  this  indicates  that 
even  sophisticated  expression  simplification  techniques  may  have 
negligible  effects  on  performance.  This  view  is  confirmed  by  another 
researcher  [Carter  82),  who  found  that  most  blocks  in  Pascal  programs 
included  only  two  to  four  assignments  and  fewer  than  two  common 
subexpressions.  It  sofrns  realistic  to  estimate  that  while  expression 
simplification  and  code  rearrangement  might  save  up  to  three-quarters 
of  the  time  spent  by  numerical  computation-intensive  programs,  the 
same  techniuues  would  prooaoly  sh  w  little  effect  on  non-numeric 
programs. 

The  elimination  of  redundant  assignments  and  useless  variables 
sefxns  more  premising.  [F’lshoff  7G]  reported  that  of  384  identifiers 
in  an  average  program,  10'i  were  unreferenced.  [Sarraga  84]  performed 
a  partial  analysis  of  variable  use  which  indicated  that  some  5%  of 


assignments  were  useless.  At  the  same  time,  other  figures  comp  i  Uni  by 


Elshoff  underscore  the  difficulty  of  performing  the  global  data  flow 
analysis  needed  for  this  type  of  optimization:  h"  found  that  13®  of 
the  gaps  between  successive  references  to  a  single  identifier  were 
more  than  100  statements  in  length. 

The  implementors  of  optimizing  compilers  have  on  occasion  pub¬ 
lished  data  indicating  the  d<-gree  of  improvement  measured  by  applying 
varying  levels  of  optimization.  Figure  17  illustrates  the  results 
cited  by  [Cocke  801  and  [Brownsmith  84],  cempand  with  the  improve*- 
ments  implemented  manually  by  [Knuth  71].  The  effects  of  the 
language-independent  VAX-11  back-end  optimizer  designed  by  [Anklam  82] 
and  currently  used  by  the  PL/I,  C,  and  PEARL  compilers,  presents  in 
Figure  18,  were  measured  by  inhibiting  individual  transformations  on  a 
series  of  benchmarks.  [Wulf  75]  attempted  to  quantity  the'  effect  on 
performance  of  each  optimization  performed  by  a  Bliss-11  compiler  (s»*e 
Figure  19);  the  intention  was  to  derive  a  formula  expressing  the 
cumulative  result  of  varying  combinations,  but  this  did  not  prove*  to 
be  practicable'. 

As  the  figure's  show,  there*  is  a  significant  range  in  performance* 
from  one*  optimizer  to  the*  next  and  from  one  benchmark  to  another.  In 
some  cases  the  e’ffects  of  individual  transformations  almost  escape* 
measurement  (the  effects  of  loop  invariant  relocation  and  subex¬ 
pression  elimination  ein  benchmark  AB-5  in  Figure*  18,  for  example*)  , 
whil“  in  others  efficiency  inen»asos  dramat ie*al  1  v  with  the*  addition  of 
a  single  technique  (e.g.  ,  the*  e*ffect  of  loop  invariant  refutation  on 


Technique 

Moacnro  _ 

Level 

of  Performance 

Minimum 

Average 

Maximum 

Local  Optimizations1 

Knuth 

time 

40% 

71% 

01% 

Cocke 

time 

32 

50 

72 

space 

42 

54 

69 

Brownsmith 

lime 

1  S 

63 

99 

Local  plus  Global 

Optimizations2 

Knuth 

time 

1 1 

28 

91 

Cocke 

time 

19 

42 

61 

space 

38 

55 

66 

1  Included  local  constant  propagat'on,  elimination  of  dead  code,  and 
local  register  allocation  optimization 


2  Added  global  constant  propagation,  strength  and  frequency  reduction, 
and  global  register  allocation  optimization 


Figure  17.  Estimated  Effects  of  Optimization 


r 

Technique 

Effect 

Factor 

Constan*  folding 

0.938 

Common  subexpression 

elimination 

-  statement  level 

0.987 

-  local  level 

0.973 

•global  level 

0.987 

Algebraic  laws 

0.975 

Code  motion 

C  985 

Elimination  of  dead  code 

0  98 

Register  allocation 

-bcal 

0.987 

nlnKfJ 

-  ODD-s 

0.975 

Cross  jumping 

0.972 

Peephole  optimization 

0  88 

Figure  19  Wulfs  Quantification  of 
the  Effects  of  Optimizing  Techniques 
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AB-4  of  the  same  figure).  The  extreme  variability  of  these  results 
illustrates  the  difficulty  ol  realistically  predicting  the  effects  of 
isolated  improvements. 

No  discussion  of  optimization  would  be  complete  without  mention 
of  parallelization.  Recent  years  have  seen  an  increasing  interest  in 
the  development  of  translation  algorithms  for  converting  sequential 
programs  to  versions  suitable  for  parallel  processing.  Available 
methods  evolved  from  data  flow  analysis  techniques  and  focus  on  array 
operations  and  looping  structures  as  the  primary  candidates  lor 
parallelization.  For  example,  the  loop  distribution  algorithm  for 
extracting  parallel  code  assigns  individual  iterations  of  a  loop  to 
different  processors.  The  pipelining  algorithm,  on  the  other  hand, 
splits  the  loop  into  several  component  sub-loops,  each  of  which  is 
then  assigned  to  a  processor.  In  general,  loop  distribution  is 
preferred  Wi.en  the  loop  body  is  small  and  the  number  of  iterations 
large;  pipelining  is  employed  when  the  proportions  are  reversed. 
Unfortunately,  a  substantial  amount  of  analysis  is  required  to 
implement  these  techniques,  nor  are  they  uniformly  applicable  to  all 
types  of  data  elements  and  looping  constructs.  Furthermore,  no 
conclusive  empirical  studies  of  the  degree  of  improvident  realized 
through  parallelization  have  emerged  to  date. 
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Appendix  B 

Optimization  Techniques  for  Applicative  Environments 

As  pointed  out  in  Chapter  3,  optimizations  based  on  normal 
control  flow  and  data  flow  analysis  are  inappropriate  in  non¬ 
sequential  environments  and  few  optimizing  transformations  have  as  yet 
been  developed  specifically  for  applicative  processing  situations. 
Those  that  are  available  can  be  generally  categorized  as  affecting 
either  substitution  or  simplification  activities. 

The  following  sections  discuss  optimizations  currently  im- 
plfmented  in  applicative  environments.  As  in  Appendix  A,  available 
techniques  are  described  in  general  terms  and  then  the  results  of 
studies  examining  the  effectiveness  of  improvement  activities  are 
presented. 

B. 1  TVpical  Optimization  Techniques 

Substitution  activities  are  optimized  by  improvements  in  heap 
storage  management .  A  heap  is  difficult  to  implement  efficiently, 
since  it  requires  that  a  large,  general-purpose  storage  area  be  made 
available  for  use  on  an  unstructured,  by-need  basis.  When  a  program 
element  is  defined,  space  is  allocated  to  it  from  a  free-space  list 
and  associated  with  the  corresponding  symbol  by  means  of  one  or  more 
levels  of  pointers.  If  the  element  is  subsequently  redefined,  the 
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pointer  or  chain  is  altered  to  point  to  a  new  location.  Note  that 
there  is  theoretically  no  limit  to  the  number  of  pointers  which  can 
reference  the  same  object.  The  term  garbage  refers  to  a  location 
which  is  no  longer  referenced  by  any  pointer,  and  should  therefore  be 
placed  on  the  free-space  list.  A  dangling  reference  occurs  when  the 
object  is  returned  prematurely  to  the  free-space  list,  even  though  one 
or  more  pointers  still  refer  to  it.  Traditional  heap  storage  systems 
avoid  dangling  references  by  creating  a  unique  object  at  each 
definition.  This  allows  garbage  to  accrue  rapidly;  when  the 
free-space  list  is  exhausted,  computation  is  suspended  while  a  garbage 
collector  searches  the  heap  area,  identifies  garbage  elements,  and 
returns  them  to  the  list. 

Garbage  collection  is  clearly  an  attractive  candidate  for  opti¬ 
mization.  Established  techniques  include  the  use  of  hashed  reforencf*- 
count  tables  to  keep  track  of  active  storage  and/or  the  subdivision  of 
storage  into  static,  read-only,  and  heap  areas,  which  somewhat  reduces 
the  area  to  be  collected.  Incremental  ("on  the  fly")  collection 
processes  a  small  section  of  heap  storage  each  time  a  specific 
operation  is  performed;  this  distributes  the  overhead  more  evenly  over 
time  but  requires  more  space  than  other  methods.  "Compile-time 
garbage  collection"  attempts  to  replace  some  operations  which  create 
new  definitions  by  altering  the  pointer  links,  but  destructive  changes 
of  this  type  do  not  preserve  equivalence  with  respect  to  multiple 
pointers  referencing  the  same  object. 
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A  second  typo  of  optimization  related  to  substitution  is 
peculiar  to  LISP-basod  implementations-  Most  versions  of  LISP  ac¬ 
commodate  one  or  more  special  types  of  dynamic  or  "fluid"  variable 
binding.  With  traditional  binding  (deep  binding),  each  time  the  fluid 
variable  is  bound  a  tree  of  symbol  tables  must  be  searched  to  find  the 
current  value.  The  observation  that  the  number  of  rebindings  is  small 
compared  to  the  magnitude  of  the  search  led  to  a  technique  called 
shal low  binding ,  whereby  a  current  value  cell  is  maintained  for  each 
name.  The  old  value  is  stacked  whenever  a  new  instance  is  bound  so 
that  it  can  be  restored  easily  when  needed. 

Additional  techniques  have  been  developed  to  improve  simpli¬ 
fication  activities.  Open  coding  involves  the  in-line  expansion  of 
common  primitive  functions  and/or  conditional  structures  (similar  to 
procedure  integration  in  the  sequential  environment).  Calling 
sequence  improvements  are  designed  to  expedite  linkages  between 
subprogram  units.  These  make  use  of  .lump  vectors,  local  branches,  and 
linkage  tables  to  eliminate  calls  to  primitive  linking  routines. 

A  related  method  of  diminishing  simplification  overhead  is  the 
removal  of  recursion.  This  is  appropriate  when  the  recursion  is 
duplicated  so  that  the  same  values  are  computed  more  than  once;  such 
techniques  are  similar  to  those  for  lazy  evaluators  (see  below).  A 
second  use  is  in  functions  with  "tail  recursion",  where  the  recursion 
is  the  last  action  of  the  current  invocation.  Since  there  is  no  need 
to  establish  a  new  application  frame,  the  recursion  is  replaced  by 


Other  schemes  improve  the  ways  in  which  parameters  are  passtxi 
between  functions,  in  an  attempt  to  decrease  the  number  of  times  a 
symbol  is  evaluated.  Parameter  rearrangements  r<*>rder  arguments  or 
place  them  in  registers  or  on  special  parameter  stacks.  Call  bjr  name 
delays  the  evaluation  of  parameters  until  they  are  actually  required, 
while  lazy  evaluation  (call  by  need)  maintains  a  table  of  values  to 
avoid  duplicate  evaluations. 

Arithmetic  operations  in  the  applicative  environment  are 
complicated  by  the  need  to  convert  numeric  values  to  and  from  pointer 
representations.  Common  improvements  include  storing  the  valu«>s  in 
registers,  on  special  numeric  stacks,  or  in  tables.  The  complexity  of 
numeric  operations  has  also  led  to  techniques  similar  to  those  used  by 
sequential  optimizers,  such  as  constant  folding,  rearrangement ,  common 
subexpression  elimination,  and  peephole  optimization,  but  on  a 
considerably  smaller  scale. 

In  summary,  although  some  optimizations  have  b«*en  developed  for 
the  applicative  environment,  they  are  not  as  well  understood  as  are 
the  techniques  discussed  in  the  last  chapter.  Few  compilers  attempt 
to  incorporate  more  than  a  handful  of  improvements  and  their  inter¬ 
relationships  are  only  hazily  defined.  The  most  discouraging  fact  is 
that  only  a  small  percentage  of  existing  implementations  offer  any 
significant  degree  of  optimization. 


B.2  Potential  for  Optimization 


For  the  reasons  outlined  in  previous  sections,  it  comes  as 
little  surprise  that  no  empirical  studies  of  statistical  significance 
have  as  yet  appeared  to  establish  the  effectiveness  of  optimization  in 
the  applicative  environnent.  Most  recent  research  efforts  have  been 
directed  instead  to  the  development  of  architectures  which  process 
applicative  programs  directly  rather  than  via  software  simulation. 
One  recent  study,  however,  compared  the  performance  of  a  series  of 
LISP  implnnen  tat  ions,  including  seme  employing  improvement  techniques 
(Gabriel  851. 

Figure  20  summarizes  the  results  of  the  Gabriel  study  on  three 
systrms  using  a  Franz  Lisp  compiler.  The  only  optimizations  described 
are  calling  sequence  improvements:  the  use  of  a  J(ump)S(u)B(routine) 
instruction  to  perform  direct  Jumps  to  functions  included  as  part  of 
the  same  compilation  unit,  and  the  incorporation  of  a  transfer  vector 
to  replace  the  invocation  primitive  routine.  It  should  be  noted  that 
all  of  the  benchmarks  tested  were  task -spec i  f  ic ,  and  therefore  are 
subject  to  the  biasi's  described  in  Section  2.2. 

The  results  illustrate  what  appears  to  be  a  chronic  problem  with 
applicative  optimizations.  Although  quite  substantial  improvements  in 
run-time  behavior  are  noted  for  some  tests,  performance  is  actually 
degraded  in  other  cases.  (Bruynoogho  841  reports  similar  results  for 
the  application  to  PROLOG  programs  of  a  technique  called  "intelligent 
back  tracking".  His  tests  were  run  on  typical  lanallscale  AI  problems, 
with  results  that  ranged  from  0.3  to  219  (with  an  average  of  112) 
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It.  I*. 


Benchmarks 


Recursion1 
VAX  11/750 
VAX  11/780 
Sunil 


Knowledge  Base 
VAX  11/750 
VAX  11/780 
Sun  II 


Troa*  ® 

VAX  11/750 
VAX  11/780 
Sun  II 


Searching4 
Sun  U 


Level  of  Performance  * 


Minimum  I  Avaraga  Maximum 


VAX  11/750 
VAX  11/780 


Garbage  collection  was  axdudad  from  the  time  calculation 
Included  tha  Tak,  Stak,  Ctak  and  Tald  benchmarks 
Boyar  and  Browse 

Destructive,  T reverse Jnitialization,  and  Traversa 
Puzzle 

File  Print,  Fiia_Raad,  Terminal  Print 


Figure  20.  Gabriel’s  Benchmarks 
on  the  Effects  of  Optimization 
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percent  of  the  time  required  for  the  unimproved  versions.  This 
clearly  violates  the  fundamental  rule  that  an  optimizing  transforma¬ 
tion  be  guaranteed  at  least  not  to  adversely  affect  performance. 

The  shift  from  sequential  to  non-sequential  architectures,  on 
the  other  hand,  seems  encouraging.  Any  program  performs  significantly 
better  once  the  software  interpreting  layers  are  eliminated  from  the 
applicative  environment.  The  incorporation  of  parallelism  will 
undoubtedly  improve  this  situation  even  more,  since  the  locality  of 
effect  and  referential  transparency  properties  of  symbolic  programs 
make  them  apt  candidates  for  parallelization.  Furthermore,  the 
"generator"  primitives  of  the  functional  languages  (e.g.,  the  MAP 
routines  of  LISP)  are  implicitly  parallel  constructs  which  can  easily 
be  adapted  to  concurrent  processing  situations.  Finally,  garbage 
collection  activities  have  already  been  targeted  for  implementation  on 
separate,  dedicated  processors,  with  the  promise  of  substantial  im¬ 
provements  in  execution  time. 
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